Docket No. : GTC03-02 



Date: Jl^ /&, **o3 
EXPRESS MAIL LABEL NO. EL442001705 US 



NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
STREPTOCOCCUS PNEUMONIAE FOR DIAGNOSTICS AND THERAPEUTICS 



5 This application claims priority of U.S. provisional applications 60/051553, filed 

July 2, 1997; and 60/ 085131 filed May 12, 1998, all of which are hereby incorporated 
herein by reference in their entirety. 



Field Of The Invention 
10 The invention relates to isolated nucleic acids and polypeptides derived from 

Streptococcus pneumoniae that are useful as molecular targets for diagnostics, 
prophylaxis and treatment of pathological conditions, as well as materials and methods 
for the diagnosis, prevention, and amelioration of pathological conditions resulting from 
bacterial infection. 

15 

Background Of The Invention 

Streptococcus pneumoniae (S. pneumoniae) is a common, spherical, gram- 
positive bacterium. Worldwide it is a leading cause of illness among children, the elderly, 
and individuals with debilitating medical conditions (Breiman, R. F. et al., 1994, JAMA 

20 271: 183 1). 5. pneumoniae is estimated to be the causal agent in 3,000 cases of 

meningitis, 50,000 cases of bacteremia, 500,000 cases of pneumonia, and 7,000,000 
cases of otitis media annnually in the United States alone (Reichler, M. R. et al., 1992, J. 
Infect. Dis. 166: 1346; Stool, S. E. and Field, M. J., 1989 Pediatr. Infect. Dis J. 8: SI 1). 
In the United States alone, 40,000 deaths result annually from S. pneumoniae infections 

25 (Williams, W. W. et al, 1988 Ann. Intern. Med. 108: 616) with a death rate approaching 
30% from bacteremia (Butler, J. C. et al., 1993, JAMA 270: 1826). Pneumococcal 
pneumonia is a serious problem among the elderly of industrialized nations (Kayhty, H. 
and Eskola, J., 1996 Emerg. Infect. Dis. 2: 289) and is a leading cause of death among 
children in developing nations (Kayhty, H. and Eskola, J., 1996 Emerg. Infect. Dis. 2: 

30 289; Stansfield, S. K., 1987 Pediatr. Infect. Dis. 6: 622). 

Vaccines against S. pneumoniae have been available for a number of years. There 
are a large number of serotypes based on the polysaccharide capsule (van Dam, J. E., 
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Fleer, A., and Snippe, H., 1990 Antonie van Leeuwenhoek 58: 1) although only a fraction 
of the serotypes seem to be associated with infections (Martin, D. R. and Brett, M. S., 
1996 N. Z. Med. J. 109 : 288). A multivalent vaccine against capsular polysaccharides of 
23 serotypes (Smart, L. E., Dougall, A. J. and Gridwood, R. W., 1987 J. Infect. 14: 209) 
5 has provided protection for some groups but not for several groups at risk for 

pneumococcal infections, such as infants and the elderly (Makel, P. H. et al., 1980 Lancet 
2: 547; Sankilampi, U., 1996 J. Infect. Dis. 173: 387). Conjugated pneumococcal 
capsular polysaccharide vaccines have somewhat improved efficacy , but are costly and, 
therefore, are not likely to be be in widespread use (Kayhty, H. and Eskola, J., 1996 

10 Emerg. Infect. Dis. 2: 289). 

At one time, S. pneumoniae strains were uniformly susceptible to penicillin. The 
report of a penicillin-resistant strain of (Hansman, D. and Bullen, M. M., 1967 Lancet 1: 
264) was followed rapidly by many reports indicating the worldwide emergence of 
penicillin-resistant and penicillin non-susceptible strains (Klugman, K. P., 1990 Clin. 

15 Microbiol. Rev. 3: 171). S. pneumoniae strains which are resistant to multiple antibiotics 
(including penicillin) have also been observed recently within the United States (Welby, 
P. L., 1994 Pediatr. Infect. Dis. J. 13: 281; Ducin, J. S. et al., 1995 Pediatr. Infect. Dis. J. 
14: 745; Butler, J. C, 1996 J. Infect. Dis. 174: 986) as well as internationally (Boswell, 
T. C. et al., 1996; J. Infect. 33: 17; Catchpole, C, Fraise, A., and Wise, R., 1996 Microb. 

20 Drug Resist. 2: 431; Tarasi, A. et al., 1997 Microb. Drug Resist. 3: 105). 

A high incidence of morbidity is associated with invasive S. pneumoniae 
infections (Williams, W. W. et al., 1988 Ann. Intern. Med. 108: 616). Because of the 
incomplete effectiveness of currently available vaccines and antibiotics, the identification 
of new targets for antimicrobial therapies, including, but not limited to, the design of 

25 vaccines and antibiotics, which may help prevent infection or that may be useful in 
fighting existing infections, is highly desirable. 



Summary Of The Invention 

The present invention fulfills the needfor diagnostic tools and threapeutics by 
30 providing bacterial-specific compositions and methods for detecting, treating, and 
preventing bacterial infection, in particular S. pneumoniae infection. 
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The present invention encompasses isolated polypeptides and nucleic acids 
derived from S. pneumoniae that are useful as reagents for diagnosis of bacterial 
infection, components of effective antibacterial vaccines, and/or as targets for 
antibacterial drugs, including anti-S. pneumoniae drugs. The nucleic acids and peptides 
5 of the present invention also have utility for diagnostics and therapeutics for S. 
pneumoniae and other Streptococcus species. They can also be used to detect the 
presence of S. pneumoniae and other Streptococcus species in a sample; and in screening 
compounds for the ability to interfere with the S. pneumoniae life cycle or to inhibit S. 
pneumoniae infection. More specifically, this invention features compositions of nucleic 

10 acids corresponding to entire coding sequences of S. pneumoniae proteins, including 
surface or secreted proteins or parts thereof, nucleic acids capable of binding mRNA 
from S. pneumoniae proteins to block protein translation, and methods for producing S. 
pneumoniae proteins or parts thereof using peptide synthesis and recombinant DNA 
techniques. This invention also features antibodies and nucleic acids useful as probes to 

1 5 detect S. pneumoniae infection. In addition, vaccine compositions and methods for the 
protection or treatment of infection by 5. pneumoniae are within the scope of this 
invention. 

The nucleotide sequences provided in SEQ ID NO: 1 - SEQ ID NO: 2603, a 
fragment thereof, or a nucleotide sequence at least 99.5% identical to a sequence 

20 contained within SEQ ID NO: 1 - SEQ ID NO: 2603 may be "provided" in a variety of 
medias to facilitate use thereof. As used herein, "provided" refers to a manufacture, 
other than an isolated nucleic acid molecule, which contains a nucleotide sequence of the 
present invention, i.e., the nucleotide sequence provided in SEQ ID NO: 1 - SEQ ID NO: 
2603 , a fragment thereof, or a nucleotide sequence at least 99.5% identical to a 

25 sequence contained within SEQ ID NO: 1 - SEQ ID NO: 2603. Uses for and methods 
for providing nucleotide sequences in a variety of media is well known in the art (see 
e.g., EPO Publication No. EP 0 756 006) 

In one application of this embodiment, a nucleotide sequence of the present 
invention can be recorded on computer readable media. As used herein, "computer 

30 readable media" refers to any media which can be read and accessed directly by a 

computer. Such media include, but are not limited to: magnetic storage media, such as 
floppy discs, hard disc storage media, and magnetic tape; optical storage media such as 
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CD-ROM; electrical storage media such as RAM and ROM; and hybrids of these 
categories such as magnetic/optical storage media. A person skilled in the art can 
readily appreciate how any of the presently known computer readable media can be used 
to create a manufacture comprising computer readable media having recorded thereon a 
5 nucleotide sequence of the present invention. 

As used herein, "recorded" refers to a process for storing information on 
computer readable media. A person skilled in the art can readily adopt any of the 
presently known methods for recording information on computer readable media to 
generate manufactures comprising the nucleotide sequence information of the present 
10 invention. 

A variety of data storage structures are available to a person skilled in the art for 
creating a computer readable media having recorded thereon a nucleotide sequence of the 
present invention. The choice of the data storage structure will generally be based on the 
means chosen to access the stored information. In addition, a variety of data processor 

1 5 programs and formats can be used to store the nucleotide sequence information of the 
present invention on computer readable media. The sequence information can be 
represented in a word processing text file, formatted in commercially-available software 
such as WordPerfect and Microsoft Word, or represented in the form of an ASCII file, 
stored in a database application, such as DB2, Sybase, Oracle, or the like. A person 

20 skilled in the art can readily adapt any number of data processor structuring formats (e.g. 
text file or database) in order to obtain computer readable media having recorded thereon 
the nucleotide sequence information of the present invention. 

By providing the nucleotide sequence of SEQ ID NO: 1 - SEQ ID NO: 2603 , a 
fragment thereof, or a nucleotide sequence at least 99.5% identical to a sequence 

25 contained within SEQ ID NO: 1 - SEQ ID NO: 2603 in computer readable form, a 
person skilled in the art can routinely access the sequence information for a variety of 
purposes. Computer software is publicly available which allows a person skilled in the 
art to access sequence information provided in a computer readable media. Examples of 
such computer software include programs of the "Staden Package", "DNA Star", 

30 "Mac Vector", GCG "Wisconsin Package" (Genetics Computer Group, Madison, WI)and 
"NCBI toolbox" (National Center for Biotechnology Information).. 
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Computer algorithms enable the identification of S. pneumoniae open reading 
frames (ORFs) within SEQ ID NO: 1 - SEQ ID NO: 2603 which contain homology to 
ORFs or proteins from other organisms. Examples of such similarity-search algorithms 
include the BLAST [Altschul et al., J. Mol. Biol. 215:403-410 (1990)] and Smith- 
5 Waterman [Smith and Waterman (1981) Advances in Applied Mathematics, 2:482-489] 
search algorithms. These algorithms are utilized on computer systems as exemplified 
below. The ORFs so identified represent protein encoding fragments within the S. 
pneumoniae genome and are useful in producing commercially important proteins such 
as enzymes used in fermentation reactions and in the production of commercially useful 
10 metabolites. 

The present invention further provides systems, particularly computer-based 
systems, which contain the sequence information described herein. Such systems are 
designed to identify commercially important fragments of the S. pneumoniae genome. 
As used herein, "a computer-based system" refers to the hardware means, software 

15 means, and data storage means used to analyze the nucleotide sequence information of 
the present invention. The minimum hardware means of the computer-based systems of 
the present invention comprises a central processing unit (CPU), input means, output 
means, and data storage means. A person skilled in the art can readily appreciate that any 
one of the currently available computer-based systems is suitable for use in the present 

20 invention. The computer-based systems of the present invention comprise a data storage 
means having stored therein a nucleotide sequence of the present invention and the 
necessary hardware means and software means for supporting and implementing a search 
means. As used herein, "data storage means" refers to memory which can store 
nucleotide sequence information of the present invention, or a memory access means 

25 which can access manufactures having recorded thereon the nucleotide sequence 
information of the present invention. 

As used herein, "search means" refers to one or more programs which are 
implemented on the computer-based system to compare a target sequence or target 
structural motif with the sequence information stored within the data storage means. 

30 Search means are used to identify fragments or regions of the S. pneumoniae genome 

which are similar to, or "match", a particular target sequence or target motif. A variety of 
known algorithms are known in the art and have been disclosed publicly, and a variety of 
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commercially available software for conducting homology-based similarity searches are 
available and can be used in the computer-based systems of the present invention. 
Examples of such software includes, but is not limited to, FASTA (GCG Wisconsin 
Package), Bic_SW (Compugen Bioccelerator, BLASTN2, BLASTP2 and BLASTX2 
5 (NCBI) and Motifs (GCG). BLASTN2, A person skilled in the art can readily recognize 
that any one of the available algorithms or implementing software packages for 
conducting homology searches can be adapted for use in the present computer- based 
systems. 

As used herein, a "target sequence" can be any DNA or amino acid sequence of 

10 six or more nucleotides or two or more amino acids. Aperson skilled in the art can 

readily recognize that the longer a target sequence is, the less likely a target sequence will 
be present as a random occurrence in the database. The most preferred sequence length of 
a target sequence is from about 10 to 100 amino acids or from about 30 to 300 
nucleotide residues. However, it is well recognized that many genes are longer than 500 

1 5 amino acids, or 1 .5 kb in length, and that commercially important fragments of the S. 
pneumoniae genome, such as sequence fragments involved in gene expression and 
protein processing, will often be shorter than 30 nucleotides. 

As used herein, "a target structural motif," or "target motif," refers to any 
rationally selected sequence or combination of sequences in which the sequence(s) are 

20 chosen based on a specific functional domain or three-dimensional configuration which 
is formed upon the folding of the target polypeptide. There are a variety of target motifs 
known in the art. Protein target motifs include, but are not limited to, enzymatic active 
sites, membrane spanning regions, and signal sequences. Nucleic acid target motifs 
include, but are not limited to, promoter sequences, hairpin structures and inducible 

25 expression elements (protein binding sequences). 

A variety of structural formats for the input and output means can be used to input 
and output the information in the computer-based systems of the present invention. A 
preferred format for an output means ranks fragments of the S. pneumoniae genome 
possessing varying degrees of homology to the target sequence or target motif. Such 

30 presentation provides aperson skilled in the art with a ranking of sequences which 

contain various amounts of the target sequence or target motif and identifies the degree of 
homology contained in the identified fragment. 
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A variety of comparing means can be used to compare a target sequence or target 
motif with the data storage means to identify sequence fragments of the S. pneumoniae 
genome. In the present examples, implementing software which implement the 
BLASTP2 and bic_SW algorithms (Altschul et al, J Mol. Biol. 215:403-410 (1990); 

5 Compugen Biocellerator) was used to identify open reading frames within the S. 

pneumoniae genome. A person skilled in the art can readily recognize that any one of the 
publicly available homology search programs can be used as the search means for the 
computer- based systems of the present invention. 

The invention features S. pneumoniae polypeptides, preferably a substantially 

1 0 pure preparation of an 5. pneumoniae polypeptide, or a recombinant S. pneumoniae 
polypeptide. In preferred embodiments: the polypeptide has biological activity; the 
polypeptide has an amino acid sequence at least 60%, 70%, 80%, 90%, 95%, 98%, or 
99% identical to an amino acid sequence of the invention contained in the Sequence 
Listing, preferably it has about 65% sequence identity with an amino acid sequence of the 

1 5 invention contained in the Sequence Listing, and most preferably it has about 92% to 
about 99% sequence identity with an amino acid sequence of the invention contained in 
the Sequence Listing; the polypeptide has an amino acid sequence essentially the same as 
an amino acid sequence of the invention contained in the Sequence Listing; the 
polypeptide is at least 5, 10, 20, 50, 100, or 150 amino acid residues in length; the 

20 polypeptide includes at least 5, preferably at least 10, more preferably at least 20, more 
preferably at least 50, 100, or 150 contiguous amino acid residues of the invention 
contained in the Sequence Listing. In yet another preferred embodiment, the amino acid 
sequence which differs in sequence identity by about 7% to about 8% from the S. 
pneumoniae amino acid sequences of the invention contained in the Sequence Listing is 

25 also encompassed by the invention. 

In preferred embodiments: the S. pneumoniae polypeptide is encoded by a 
nucleic acid of the invention contained in the Sequence Listing, or by a nucleic acid 
having at least 60%, 70%, 80%, 90%, 95%, 98%, or 99% homology with a nucleic acid 
of the invention contained in the Sequence Listing. 

30 In a preferred embodiment, the subject S. pneumoniae polypeptide differs in 

amino acid sequence at 1, 2, 3, 5, 10 or more residues from a sequence of the invention 
contained in the Sequence Listing. The differences, however, are such that the & 
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pneumoniae polypeptide exhibits an S. pneumoniae biological activity, e.g., the S. 

pneumoniae polypeptide retains a biological activity of a naturally occurring S. 

pneumoniae enzyme. 

In preferred embodiments, the polypeptide includes all or a fragment of an amino 
5 acid sequence of the invention contained in the Sequence Listing; fused, in reading 

frame, to additional amino acid residues, preferably to residues encoded by genomic 

DNA 5' or 3* to the genomic DNA which encodes a sequence of the invention contained 

in the Sequence Listing. 

In yet other preferred embodiments, the S. pneumoniae polypeptide is a 
10 recombinant fusion protein having a first S. pneumoniae polypeptide portion and a 

second polypeptide portion, e.g., a second polypeptide portion having an amino acid 

sequence unrelated to S. pneumoniae. The second polypeptide portion can be, e.g., any 

of glutathione-S-transferase, a DNA binding domain, or a polymerase activating domain. 

In preferred embodiment the fusion protein can be used in a two-hybrid assay. 
1 5 Polypeptides of the invention include those which arise as a result of alternative 

transcription events, alternative RNA splicing events, and alternative translational and 

postranslational events. 

In a preferred embodiment, the encoded S. pneumoniae polypeptide differs (e.g., 

by amino acid substitution, addition or deletion of at least one amino acid residue) in 
20 amino acid sequence at 1, 2, 3, 5, 10 or more residues, from a sequence of the invention 

contained in the Sequence Listing. The differences, however, are such that: the S. 

pneumoniae encoded polypeptide exhibits a S. pneumoniae biological activity, e.g., the 

encoded S. pneumoniae enzyme retains a biological activity of a naturally occurring S. 

pneumoniae. 

25 In preferred embodiments, the encoded polypeptide includes all or a fragment of 

an amino acid sequence of the invention contained in the Sequence Listing; fused, in 
reading frame, to additional amino acid residues, preferably to residues encoded by 
genomic DNA 5' or 3' to the genomic DNA which encodes a sequence of the invention 
contained in the Sequence Listing. 

30 The S. pneumoniae strain, 14453, from which genomic sequences have been 

sequenced, has been deposited on June 26, 1997 in the American Type Culture 
Collection and assigned the ATCC designation # 55987. 
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Included in the invention are: allelic variations; natural mutants; induced mutants; 
proteins encoded by DNA that hybridize under high or low stringency conditions to a 
nucleic acid which encodes a polypeptide of the invention contained in the Sequence 
Listing (for definitions of high and low stringency see Current Protocols in Molecular 
5 Biology, John Wiley & Sons, New York, 1989, 6.3.1 - 6.3.6, hereby incorporated by 
reference); and, polypeptides specifically bound by antisera to S. pneumoniae 
polypeptides, especially by antisera to an active site or binding domain of S. pneumoniae 
polypeptide. The invention also includes fragments, preferably biologically active 
fragments. These and other polypeptides are also referred to herein as S. pneumoniae 
10 polypeptide analogs or variants. 

The invention further provides nucleic acids, e.g., RNA or DNA, encoding a 
polypeptide of the invention. This includes double stranded nucleic acids as well as 
coding and antisense single strands. 

In preferred embodiments, the subject S. pneumoniae nucleic acid will include a 
15 transcriptional regulatory sequence, e.g. at least one of a transcriptional promoter or 

transcriptional enhancer sequence, operably linked to the S. pneumoniae gene sequence, 
e.g., to render the S. pneumoniae gene sequence suitable for expression in a recombinant 
host cell. 

In yet a further preferred embodiment, the nucleic acid which encodes an S. 

20 pneumoniae polypeptide of the invention, hybridizes under stringent conditions to a 
nucleic acid probe corresponding to at least 8 consecutive nucleotides of the invention 
contained in the Sequence Listing; more preferably to at least 12 consecutive nucleotides 
of the invention contained in the Sequence Listing; more preferably to at least 20 
consecutive nucleotides of the invention contained in the Sequence Listing; more 

25 preferably to at least 40 consecutive nucleotides of the invention contained in the 
Sequence Listing. 

In another aspect, the invention provides a substantially pure nucleic acid having 
a nucleotide sequence which encodes an S. pneumoniae polypeptide. In preferred 
embodiments: the encoded polypeptide has biological activity; the encoded polypeptide 
30 has an amino acid sequence at least 60%, 70%, 80%, 90%, 95%, 98%, or 99% 

homologous to an amino acid sequence of the invention contained in the Sequence 
Listing; the encoded polypeptide has an amino acid sequence essentially the same as an 
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amino acid sequence of the invention contained in the Sequence Listing; the encoded 
polypeptide is at least 5, 10, 20, 50, 100, or 150 amino acids in length; the encoded 
polypeptide comprises at least 5, preferably at least 10, more preferably at least 20, more 
preferably at least 50, 100, or 150 contiguous amino acids of the invention contained in 
5 the Sequence Listing. 

In another aspect, the invention encompasses: a vector including a nucleic acid 
which encodes an S. pneumoniae polypeptide or an S. pneumoniae polypeptide variant as 
described herein; a host cell transfected with the vector; and a method of producing a 
recombinant S. pneumoniae polypeptide or S. pneumoniae polypeptide variant; including 
10 culturing the cell, e.g., in a cell culture medium, and isolating an S. pneumoniae 

polypeptide or an S. pneumoniae polypeptide variant, e.g., from the cell or from the cell 
culture medium. 

In another series of embodiments, the invention provides isolated nucleic acids 
comprising sequences at least about 8 nucleotides in length, more preferably at least 

15 about 12 nucleotides in length, and most preferably at least about 15-20 nucleotides in 
length, that correspond to a subsequence of any one of SEQ ID NO: 1 - SEQ ID NO: 
2603 or complements thereof Alternatively, the nucleic acids comprise sequences 
contained within any ORF (open reading frame), including a complete protein-coding 
sequence, of which any of SEQ ID NO: 1 - SEQ ID NO: 2603 forms a part. The 

20 invention encompasses sequence-conservative variants and function-conservative 

variants of these sequences. The nucleic acids may be DNA, RNA, DNA/RNA duplexes, 
protein-nucleic acid (PNA), or derivatives thereof. 

In another aspect, the invention features, a purified recombinant nucleic acid 
having at least 50%, 60%, 70%, 80%, 90%, 95%, 98%, or 99% homology with a 

25 sequence of the invention contained in the Sequence Listing. 

In another aspect, the invention features nucleic acids capable of binding mRNA 
of S. pneumoniae. Such nucleic acid is capable of acting as antisense nucleic acid to 
control the translation of mRNA of S. pneumoniae. A further aspect features a nucleic 
acid which is capable of binding specifically to an S. pneumoniae nucleic acid. These 

30 nucleic acids are also referred to herein as complements and have utility as probes and as 
capture reagents. 
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In another aspect, the invention features an expression system comprising an open 
reading frame corresponding to S. pneumoniae nucleic acid. The nucleic acid further 
comprises a control sequence compatible with an intended host. The expression system 
is useful for making polypeptides corresponding to S. pneumoniae nucleic acid. 
5 In another aspect, the invention features a cell transformed with the expression 

system to produce S. pneumoniae polypeptides. 

In yet another embodiment, the invention encompasses reagents for detecting 
bacterial infection, including S. pneumoniae infection, which comprise at least one S. 
pneumoniae-der'wed nucleic acid defined by any one of SEQ ID NO: 1 - SEQ ID NO: 

10 2603 , or sequence-conservative or function-conservative variants thereof. Alternatively, 
the diagnostic reagents comprise polypeptide sequences that are contained within any 
open reading frames (ORFs), including complete protein-coding sequences, contained 
within any of SEQ ID NO: 1 - SEQ ID NO: 2603 , or polypeptide sequences contained 
within any of SEQ ID NO: 2604 - SEQ ID NO: 5206 , or polypeptides of which any of 

15 the above sequences forms a part, or antibodies directed against any of the above peptide 
sequences or function-conservative variants and/or fragments thereof. 

The invention further provides antibodies, preferably monoclonal antibodies, 
which specifically bind to the polypeptides of the invention. Methods are also provided 
for producing antibodies in a host animal. The methods of the invention comprise 

20 immunizing an animal with at least one S. pneumoniae-derived immunogenic 
component, wherein the immunogenic component comprises one or more of the 
polypeptides encoded by any one of SEQ ID NO: 1 - SEQ ID NO: 2603 or sequence- 
conservative or function-conservative variants thereof; or polypeptides that are contained 
within any ORFs, including complete protein-coding sequences, of which any of SEQ ID 

25 NO: 1 - SEQ ID NO: 2603 forms a part; or polypeptide sequences contained within any 
of SEQ ID NO: 2604 - SEQ ID NO: 5206 ; or polypeptides of which any of SEQ ID NO: 
2604 - SEQ ID NO: 5206 forms a part. Host animals include any warm blooded animal, 
including without limitation mammals and birds. Such antibodies have utility as 
reagents for immunoassays to evaluate the abundance and distribution of S. pneumoniae- 

30 specific antigens. 

In yet another aspect, the invention provides a method for detecting bacterial 
antigenic components in a sample, which comprises the steps of: (i) contacting a sample 
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suspected to contain a bacterial antigenic component with a bacterial-specific antibody, 
under conditions in which a stable antigen-antibody complex can form between the 
antibody and bacterial antigenic components in the sample; and (ii) detecting any 
antigen-antibody complex formed in step (i), wherein detection of an antigen-antibody 
5 complex indicates the presence of at least one bacterial antigenic component in the 
sample. In different embodiments of this method, the antibodies used are directed 
against a sequence encoded by any of SEQ ID NO: 1 - SEQ ID NO: 2603 or sequence- 
conservative or function-conservative variants thereof, or against a polypeptide sequence 
contained in any of SEQ ID NO: 2604 - SEQ ID NO: 5206 or function-conservative 

10 variants thereof. 

In yet another aspect, the invention provides a method for detecting antibacterial- 
specific antibodies in a sample, which comprises: (i) contacting a sample suspected to 
contain antibacterial-specific antibodies with a S. pneumoniae antigenic component, 
under conditions in which a stable antigen-antibody complex can form between the S. 

1 5 pneumoniae antigenic component and antibacterial antibodies in the sample; and (ii) 
detecting any antigen-antibody complex formed in step (i), wherein detection of an 
antigen-antibody complex indicates the presence of antibacterial antibodies in the 
sample. In different embodiments of this method, the antigenic component is encoded by 
a sequence contained in any of SEQ ID NO: 1 - SEQ ID NO: 2603 or sequence- 

20 conservative and function-conservative variants thereof, or is a polypeptide sequence 
contained in any of SEQ ID NO: 2604 - SEQ ID NO: 5206 or function-conservative 
variants thereof. 

In another aspect, the invention features a method of generating vaccines for 
immunizing an individual against S. pneumoniae. The method includes: immunizing a 

25 subject with an S. pneumoniae polypeptide, e.g., a surface or secreted polypeptide, or 
active portion thereof, and a pharmaceutically acceptable carrier. Such vaccines have 
therapeutic and prophylactic utilities. 

In another aspect, the invention features a method of evaluating a compound, e.g. 
a polypeptide, e.g., a fragment of a host cell polypeptide, for the ability to bind an S. 

30 pneumoniae polypeptide. The method includes: contacting the candidate compound with 
an S. pneumoniae polypeptide and determining if the compound binds or otherwise 
interacts with an S. pneumoniae polypeptide. Compounds which bind S. pneumoniae are 
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candidates as activators or inhibitors of the bacterial life cycle. These assays can be 
performed in vitro or in vivo. 

In another aspect, the invention features a method of evaluating a compound, e.g. 
a polypeptide, e.g., a fragment of a host cell polypeptide, for the ability to bind an S. 
5 pneumoniae nucleic acid, e.g., DNA or RNA. The method includes: contacting the 
candidate compound with an S. pneumoniae nucleic acid and determining if the 
compound binds or otherwise interacts with an S. pneumoniae polypeptide. Compounds 
which bind S. pneumoniae are candidates as activators or inhibitors of the bacterial life 
cycle. These assays can be performed in vitro or in vivo. 

10 

DETAILED DESCRIPTION OF THE INVENTION 

The sequences of the present invention include the specific nucleic acid and 
amino acid sequences set forth in the Sequence Listing that forms a part of the present 
specification, and which are designated SEQ ID NO: 1 - SEQ ID NO: 5206. Use of the 

15 terms "SEQ ID NO: 1 - SEQ ID NO: 2603", "SEQ ID NO: 2604 - SEQ ID NO: 5206", 
"the sequences depicted in Table 2", etc., is intended, for convenience, to refer to each 
individual SEQ ID NO individually, and is not intended to refer to the genus of these 
sequences. In other words, it is a shorthand for listing all of these sequences individually. 
The invention encompasses each sequence individually, as well as any combination 

20 thereof. 

Definitions 

"Nucleic acid" or "polynucleotide" as used herein refers to purine- and 
pyrimidine-containing polymers of any length, either polyribonucleotides or 
25 polydeoxyribonucleotides or mixed polyribo-polydeoxyribo nucleotides. This includes 
single- and double-stranded molecules, i.e., DNA-DNA, DNA-RNA and RNA-RNA 
hybrids, as well as "protein nucleic acids" (PNA) formed by conjugating bases to an 
amino acid backbone. This also includes nucleic acids containing modified bases. 

A nucleic acid or polypeptide sequence that is "derived from" a designated 
30 sequence refers to a sequence that corresponds to a region of the designated sequence. 
For nucleic acid sequences, this encompasses sequences that are homologous or 
complementary to the sequence, as well as "sequence-conservative variants" and 
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"function-conservative variants." For polypeptide sequences, this encompasses 
"function-conservative variants." Sequence-conservative variants are those in which a 
change of one or more nucleotides in a given codon position results in no alteration in the 
amino acid encoded at that position. Function-conservative variants are those in which a 

5 given amino acid residue in a polypeptide has been changed without altering the overall 
conformation and function of the native polypeptide, including, but not limited to, 
replacement of an amino acid with one having similar physico-chemical properties (such 
as, for example, acidic, basic, hydrophobic, and the like). "Function-conservative" 
variants also include any polypeptides that have the ability to elicit antibodies specific to 

1 0 a designated polypeptide. 

An "S. pneumoniae-derived" nucleic acid or polypeptide sequence may or may 
not be present in other bacterial species, and may or may not be present in all S. 
pneumoniae strains. This term is intended to refer to the source from which the sequence 
was originally isolated. Thus, a S. pneumoniae-tev\\Q<\ polypeptide, as used herein, may 

15 be used, e.g., as a target to screen for a broad spectrum antibacterial agent, to search for 
homologous proteins in other species of bacteria or in eukaryotic organisms such as fungi 
and humans, etc. 

A purified or isolated polypeptide or a substantially pure preparation of a 
polypeptide are used interchangeably herein and, as used herein, mean a polypeptide that 

20 has been separated from other proteins, lipids, and nucleic acids with which it naturally 
occurs. Preferably, the polypeptide is also separated from substances, e.g., antibodies or 
gel matrix, e.g., polyacrylamide, which are used to purify it. Preferably, the polypeptide 
constitutes at least 10, 20, 50 70, 80 or 95% dry weight of the purified preparation. 
Preferably, the preparation contains: sufficient polypeptide to allow protein sequencing; 

25 at least 1 , 1 0, or 1 00 mg of the polypeptide. 

A purified preparation of cells refers to, in the case of plant or animal cells, an in 
vitro preparation of cells and not an entire intact plant or animal. In the case of cultured 
cells or microbial cells, it consists of a preparation of at least 10% and more preferably 
50% of the subject cells. 

30 A purified or isolated or a substantially pure nucleic acid, e.g., a substantially 

pure DNA, (are terms used interchangeably herein) is a nucleic acid which is one or both 
of the following: not immediately contiguous with both of the coding sequences with 
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which it is immediately contiguous (i.e., one at the 5' end and one at the 3' end) in the 
naturally-occurring genome of the organism from which the nucleic acid is derived; or 
which is substantially free of a nucleic acid with which it occurs in the organism from 
which the nucleic acid is derived. The term includes, for example, a recombinant DNA 
5 which is incorporated into a vector, e.g., into an autonomously replicating plasmid or 
virus, or into the genomic DNA of a prokaryote or eukaryote, or which exists as a 
separate molecule (e.g., a cDNA or a genomic DNA fragment produced by PCR or 
restriction endonuclease treatment) independent of other DNA sequences. Substantially 
pure DNA also includes a recombinant DNA which is part of a hybrid gene encoding 

10 additional S. pneumoniae DNA sequence. 

A "contig" as used herein is a nucleic acid representing a continuous stretch of 
genomic sequence of an organism. 

An "open reading frame", also referred to herein as ORF, is a region of nucleic 
acid which encodes a polypeptide. This region may represent a portion of a coding 

1 5 sequence or a total sequence and can be determined from a stop to stop codon or from a 
start to stop codon. 

As used herein, a "coding sequence" is a nucleic acid which is transcribed into 
messenger RNA and/or translated into a polypeptide when placed under the control of 
appropriate regulatory sequences. The boundaries of the coding sequence are determined 
20 by a translation start codon at the five prime terminus and a translation stop code at the 
three prime terminus. A coding sequence can include but is not limited to messenger 
RNA, synthetic DNA, and recombinant nucleic acid sequences. 

A "complement" of a nucleic acid as used herein refers to an anti-parallel or 
antisense sequence that participates in Watson-Crick base-pairing with the original 
25 sequence. 

A "gene product" is a protein or structural RNA which is specifically encoded by 

a gene. 

As used herein, the term "probe" refers to a nucleic acid, peptide or other 
chemical entity which specifically binds to a molecule of interest. Probes are often 
30 associated with or capable of associating with a label. A label is a chemical moiety 
capable of detection. Typical labels comprise dyes, radioisotopes, luminescent and 
chemiluminescent moieties, fluorophores, enzymes, precipitating agents, amplification 
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sequences, and the like. Similarly, a nucleic acid, peptide or other chemical entity which 
specifically binds to a molecule of interest and immobilizes such molecule is referred 
herein as a "capture ligand". Capture ligands are typically associated with or capable of 
associating with a support such as nitro-cellulose, glass, nylon membranes, beads, 
5 particles and the like. The specificity of hybridization is dependent on conditions such as 
the base pair composition of the nucleotides, and the temperature and salt concentration 
of the reaction. These conditions are readily discernable to one of ordinary skill in the art 
using routine experimentation. 

"Homologous" refers to the sequence similarity or sequence identity between two 

10 polypeptides or between two nucleic acid molecules. When a position in both of the two 
compared sequences is occupied by the same base or amino acid monomer subunit, e.g., 
if a position in each of two DNA molecules is occupied by adenine, then the molecules 
are homologous at that position. The percent of homology between two sequences is a 
function of the number of matching or homologous positions shared by the two 

15 sequences divided by the number of positions compared x 100. For example, if 6 of 10 
of the positions in two sequences are matched or homologous then the two sequences are 
60% homologous. By way of example, the DNA sequences ATTGCC and TATGGC 
share 50% homology. Generally, a comparison is made when two sequences are aligned 
to give maximum homology. 

20 Nucleic acids are hybridizable to each other when at least one strand of a nucleic 

acid can anneal to the other nucleic acid under defined stringency conditions. Stringency 
of hybridization is determined by: (a) the temperature at which hybridization and/or 
washing is performed; and (b) the ionic strength and polarity of the hybridization and 
washing solutions. Hybridization requires that the two nucleic acids contain 

25 complementary sequences; depending on the stringency of hybridization, however, 
mismatches may be tolerated. Typically, hybridization of two sequences at high 
stingency (such as, for example, in a solution of 0.5X SSC, at 65° C) requires that the 
sequences be essentially completely homologous. Conditions of intermediate stringency 
(such as, for example, 2X SSC at 65 0 C) and low stringency (such as, for example 2X 

30 SSC at 55° C), require correspondingly less overall complementarity between the 
hybridizing sequences. (IX SSC is 0.15 M NaCl, 0.015 M Na citrate). 

The terms peptides, proteins, and polypeptides are used interchangeably herein. 
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As used herein, the term "surface protein" refers to all surface accessible proteins, 
e.g. inner and outer membrane proteins, proteins adhering to the cell wall, and secreted 
proteins. 

A polypeptide has & pneumoniae biological activity if it has one, two and 

5 preferably more of the following properties: (1) if when expressed in the course of an 5. 
pneumoniae infection, it can promote, or mediate the attachment of S. pneumoniae to a 
cell; (2) it has an enzymatic activity, structural or regulatory function characteristic of an 
S. pneumoniae protein; (3) or the gene which encodes it can rescue a lethal mutation in 
an S. pneumoniae gene. A polypeptide has biological activity if it is an antagonist, 

10 agonist, or super-agonist of a polypeptide having one of the above-listed properties. 

A biologically active fragment or analog is one having an in vivo or in vitro 
activity which is characteristic of the S. pneumoniae polypeptides of the invention 
contained in the Sequence Listing, or of other naturally occurring S. pneumoniae 
polypeptides, e.g., one or more of the biological activities described herein. Especially 

15 preferred are fragments which exist in vivo, e.g., fragments which arise from post 

transcriptional processing or which arise from translation of alternatively spliced RNA's. 
Fragments include those expressed in native or endogenous cells as well as those made in 
expression systems, e.g., in CHO cells. Because peptides such as S. pneumoniae 
polypeptides often exhibit a range of physiological properties and because such 

20 properties may be attributable to different portions of the molecule, a useful S. 

pneumoniae fragment or S. pneumoniae analog is one which exhibits a biological activity 
in any biological assay for S. pneumoniae activity. Most preferably the fragment or 
analog possesses 10%, preferably 40%, more preferably 60%, 70%, 80% or 90% or 
greater of the activity of S. pneumoniae, in any in vivo or in vitro assay. 

25 Analogs can differ from naturally occurring S. pneumoniae polypeptides in amino 

acid sequence or in ways that do not involve sequence, or both. Non-sequence 
modifications include changes in acetylation, methylation, phosphorylation, 
carboxylation, or glycosylation. Preferred analogs include S. pneumoniae polypeptides 
(or biologically active fragments thereof) whose sequences differ from the wild-type 

30 sequence by one or more conservative amino acid substitutions or by one or more non- 
conservative amino acid substitutions, deletions, or insertions which do not substantially 
diminish the biological activity of the S. pneumoniae polypeptide. Conservative 
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substitutions typically include the substitution of one amino acid for another with similar 
characteristics, e.g., substitutions within the following groups: valine, glycine; glycine, 
alanine; valine, isoleucine, leucine; aspartic acid, glutamic acid; asparagine, glutamine; 
serine, threonine; lysine, arginine; and phenylalanine, tyrosine. Other conservative 
5 substitutions can be made in view of the table below. 



TABLE 1 

CONSERVATIVE AMINO ACID REPLACEMENTS 



For Amino Acid 


Code 


Replace with any of 


Alanine 


A 


D-Ala, Gly, beta-Ala, L-Cys, D-Cys 


Arginine 


R 


D-Arg, Lys, D-Lys, homo-Arg, D-homo-Arg, Met, He, 
D-Met, D-Ile, Orn, D-Orn 


Asparagine 


N 


D-Asn, Asp, D-Asp, Glu, D-Glu, Gin, D-Gln 


Aspartic Acid 


D 


D-Asp, D-Asn, Asn, Glu, D-Glu, Gin, D-Gln 


Cysteine 


C 


D-Cys, S-Me-Cys, Met, D-Met, Thr, D-Thr 


Glutamine 


Q 


D-Gln, Asn, D-Asn, Glu, D-Glu, Asp, D-Asp 


Glutamic Acid 


E 


D-Glu, D-Asp, Asp, Asn, D-Asn, Gin, D-Gln 




VJ 


Ala n.AIn Prrv H-Pm R.Ala Am 


Isoleucine 


i 


D-Ile, Val, D-Val, Leu, D-Leu, Met, D-Met 


Leucine 


L 


D-Leu, Val, D-Val, Leu, D-Leu, Met, D-Met 


Lysine 


K 


D-Lys, Arg, D-Arg, homo-Arg, D-homo-Arg, Met, D- 
Met, lie, D-Ile, Orn, D-Orn 


Methionine 


M 


D-Met, S-Me-Cys, He, D-Ile, Leu, D-Leu, Val, D-Val 


Phenylalanine 


F 


D-Phe, Tyr, D-Thr, L-Dopa, His, D-His, Tip, D-Trp, 
Trans-3,4, or 5-phenylproline, cis-3,4, or 5- 
phenylproline 


Proline 


P 


D-Pro, L-I-thioazolidine-4-carboxylic acid, D-or L-l- 
oxazolidine-4-carboxylic acid 


Serine 


S 


D-Ser, Thr, D-Thr, allo-Thr, Met, D-Met, Met(O), 
D-Met(O), L-Cys, D-Cys 



Docket No.:GTC03-02 



19 



Threonine 


1 


u-inr, oer, u-oer, ano-inr, iviei, u-ivici, iviei\w;, 
D-Met(O), Val, D-Val 


Tyrosine 


Y 


D-Tyr, Phe, D-Phe, L-Dopa, His, D-His 


Valine 


V 


D-Val, Leu, D-Leu, He, D-Ile, Met, D-Met 



Other analogs within the invention are those with modifications which increase 
peptide stability; such analogs may contain, for example, one or more non-peptide bonds 
(which replace the peptide bonds) in the peptide sequence. Also included are: analogs 
5 that include residues other than naturally occurring L-amino acids, e.g., D-amino acids or 
non-naturally occurring or synthetic amino acids, e.g., p or y amino acids; and cyclic 
analogs. 

As used herein, the term "fragment", as applied to an S. pneumoniae analog, will 
ordinarily be at least about 20 residues, more typically at least about 40 residues, 

10 preferably at least about 60 residues in length. Fragments of S. pneumoniae polypeptides 
can be generated by methods known to those skilled in the art. The ability of a candidate 
fragment to exhibit a biological activity of S. pneumoniae polypeptide can be assessed by 
methods known to those skilled in the art as described herein. Also included are S. 
pneumoniae polypeptides containing residues that are not required for biological activity 

15 of the peptide or that result from alternative mRNA splicing or alternative protein 
processing events. 

An "immunogenic component" as used herein is a moiety, such as an S. 
pneumoniae polypeptide, analog or fragment thereof, that is capable of eliciting a 
humoral and/or cellular immune response in a host animal. 

20 An "antigenic component" as used herein is a moiety, such as an S. pneumoniae 

polypeptide, analog or fragment thereof, that is capable of binding to a specific antibody 
with sufficiently high affinity to form a detectable antigen-antibody complex. 

The term "antibody" as used herein is intended to include fragments thereof 
which are specifically reactive with S. pneumoniae polypeptides. 

25 As used herein, the term "cell-specific promoter" means a DNA sequence that 

serves as a promoter, i.e., regulates expression of a selected DNA sequence operably 
linked to the promoter, and which effects expression of the selected DNA sequence in 
specific cells of a tissue. The term also covers so-called "leaky" promoters, which 
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regulate expression of a selected DNA primarily in one tissue, but cause expression in 
other tissues as well. 

Misexpression, as used herein, refers to a non-wild type pattern of gene 
expression. It includes: expression at non-wild type levels, i.e., over or under expression; 
5 a pattern of expression that differs from wild type in terms of the time or stage at which 
the gene is expressed, e.g., increased or decreased expression (as compared with wild 
type) at a predetermined developmental period or stage; a pattern of expression that 
differs from wild type in terms of decreased expression (as compared with wild type) in a 
predetermined cell type or tissue type; a pattern of expression that differs from wild type 

10 in terms of the splicing size, amino acid sequence, post-translational modification, or 
biological activity of the expressed polypeptide; a pattern of expression that differs from 
wild type in terms of the effect of an environmental stimulus or extracellular stimulus on 
expression of the gene, e.g., a pattern of increased or decreased expression (as compared 
with wild type) in the presence of an increase or decrease in the strength of the stimulus. 

15 As used herein, "host cells" and other such terms denoting microorganisms or 

higher eukaryotic cell lines cultured as unicellular entities refers to cells which can 
become or have been used as recipients for a recombinant vector or other transfer DNA, 
and include the progeny of the original cell which has been transfected. It is understood 
by individuals skilled in the art that the progeny of a single parental cell may not 

20 necessarily be completely identical in genomic or total DNA compliment to the original 
parent, due to accident or deliberate mutation. 

As used herein, the term "control sequence" refers to a nucleic acid having a base 
sequence which is recognized by the host organism to effect the expression of encoded 
sequences to which they are ligated. The nature of such control sequences differs 

25 depending upon the host organism; in prokaryotes, such control sequences generally 

include a promoter, ribosomal binding site, terminators, and in some cases operators; in 
eukaryotes, generally such control sequences include promoters, terminators and in some 
instances, enhancers. The term control sequence is intended to include at a minimum, all 
components whose presence is necessary for expression, and may also include additional 

30 components whose presence is advantageous, for example, leader sequences. 

As used herein, the term "operably linked" refers to sequences joined or ligated to 
function in their intended manner. For example, a control sequence is operably linked to 
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coding sequence by ligation in such a way that expression of the coding sequence is 
achieved under conditions compatible with the control sequence and host cell. 

The "metabolism" of a substance, as used herein, means any aspect of the 
expression, function, action, or regulation of the substance. The metabolism of a 
5 substance includes modifications, e.g., covalent or non-covalent modifications of the 
substance. The metabolism of a substance includes modifications, e.g., covalent or non- 
covalent modification, the substance induces in other substances. The metabolism of a 
substance also includes changes in the distribution of the substance. The metabolism of a 
substance includes changes the substance induces in the distribution of other substances. 

10 A "sample" as used herein refers to a biological sample, such as, for example, 

tissue or fluid isloated from an individual (including without limitation plasma, serum, 
cerebrospinal fluid, lymph, tears, saliva and tissue sections) or from in vitro cell culture 
constituents, as well as samples from the environment. 

Technical and scientific terms used herein have the meanings commonly 

1 5 understood by one of ordinary skill in the art to which the present invention pertains, 
unless otherwise defined. Reference is made herein to various methodologies known to 
those of skill in the art. Publications and other materials setting forth such known 
methodologies to which reference is made are incorporated herein by reference in their 
entireties as though set forth in full. The practice of the invention will employ, unless 

20 otherwise indicated, conventional techniques of chemistry, molecular biology, 

microbiology, recombinant DNA, and immunology, which are within the skill of the art. 
Such techniques are explained ftilly in the literature. See e.g., Sambrook, Fritsch, and 
Maniatis, Molecular Cloning; Laboratory Manual 2nd ed. (1989); DNA Cloning, 
Volumes I and II (D.N Glover ed. 1985); Oligonucleotide Synthesis (M.J. Gait ed, 1984); 

25 Nucleic Acid Hybridization (B.D. Hames & S J. Higgins eds. 1984); the series, Methods 
in Enzymoloqy (Academic Press, Inc.), particularly Vol. 154 and Vol. 155 (Wu and 
Grossman, eds.); PCR-A Practical Approach (McPherson, Quirke, and Taylor, eds., 
1991); Immunology, 2d Edition, 1989, Roitt et aL, C.V. Mosby Company, and New 
York; Advanced Immunology, 2d Edition, 1991, Male et aL, Grower Medical Publishing, 

30 New York.; DNA Cloning: A Practical Approach, Volumes I and II, 1985 (D.N. Glover 
ed.); Oligonucleotide Synthesis, 1984, (M.L. Gait ed); Transcription and Translation, 
1984 (Hames and Higgins eds.); Animal Cell Culture, 1986 (R.I. Freshney ed.); 
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Immobilized Cells and Enzymes, 1986 (DRL Press); Perbal, 1984, A Practical Guide to 
Molecular Cloning; andGene Transfer Vectors for Mammalian Cells, 1987 (J. H. Miller 
and M. P. Calos eds., Cold Spring Harbor Laboratory); 

Any suitable materials and/or methods known to those of skill can be utilized in carrying 
5 out the present invention: however preferred materials and/or methods are described. 
Materials, reagents and the like to which reference is made in the following description 
and examples are obtainable from commercial sources, unless otherwise noted. 

S. pneumoniae Genomic Sequence 

10 This invention provides nucleotide sequences of the genome of S. pneumoniae 

which thus comprises a DNA sequence library of S. pneumoniae genomic DNA. The 
detailed description that follows provides nucleotide sequences of S. pneumoniae, and 
also describes how the sequences were obtained and how ORFs and protein-coding 
sequences were identified. Also described are methods of using the disclosed S. 

15 pneumoniae sequences in methods including diagnostic and therapeutic applications. 
Furthermore, the library can be used as a database for identification and comparison of 
medically important sequences in this and other strains of 5. pneumoniae. 

To determine the genomic sequence of S. pneumoniae, DNA was isolated from 
strain 14453 of S. pneumoniae and mechanically sheared by nebulization to a median 

20 size of 2 kb. Following size fractionation by gel electrophoresis, the fragments were 
blunt-ended, ligated to adapter oligonucleotides, and cloned into each of 20 different 
pMPX vectors (Rice et al., abstracts of Meeting of Genome Mapping and Sequencing, 
Cold Spring Harbor, NY, 5/11-5/15, 1994, p. 225) and the PUC19 vector to construct a 
series of "shotgun" subclone libraries. 

25 DNA sequencing was achieved using two sequencing methods. The first method 

used multiplex sequencing procedures essentially as disclosed in Church et al., 1988, 
Science 240:185; U.S. Patents No. 4,942,124 and 5,149,625). DNA was extracted from 
pooled cultures and subjected to chemical or enzymatic sequencing. Sequencing 
reactions were resolved by electrophoresis, and the products were transferred and 

30 covalently bound to nylon membranes. Finally, the membranes were sequentially 

hybridized with a series of labelled oligonucleotides complimentary to "tag" sequences 
present in the different shotgun cloning vectors. In this manner, a large number of 
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sequences could be obtained from a single set of sequencing reactions. The remainder of 
the sequencing was performed on ABI377 automated DNA sequencers. The cloning and 
sequencing procedures are described in more detail in the Exemplification. 

Individual sequence reads were assembled using PHRAP (P. Green, Abstracts of 

5 DOE Human Genome Program Contractor-Grantee Workshop V, Jan. 1996, p. 157). The 
average contig length was about 3-4 kb. 

A variety of approaches are used to order the contigs so as to obtain a continuous 
sequence representing the entire S. pneumoniae genome. Synthetic oligonucleotides are 
designed that are complementary to sequences at the end of each contig. These 

10 oligonucleotides may be hybridized to libaries of & pneumoniae genomic DNA in, for 
example, lambda phage vectors or plasmid vectors to identify clones that contain 
sequences corresponding to the junctional regions between individual contigs. Such 
clones are then used to isolate template DNA and the same oligonucleotides are used as 
primers in polymerase chain reaction (PCR) to amplify junctional fragments, the 

1 5 nucleotide sequence of which is then determined. 

The S. pneumoniae sequences were analyzed for the presence of open reading 
frames (ORFs) comprising at least 180 nucleotides. As a result of the analysis of ORFs 
based on stop-to-stop codon reads, it should be understood that these ORFs may not 
correspond to the ORF of a naturally-occurring S. pneumoniae polypeptide. These ORFs 

20 may contain start codons which indicate the initiation of protein synthesis of a naturally- 
occurring S. pneumoniae polypeptide. Such start codons within the ORFs provided 
herein can be identified by those of ordinary skill in the relevant art, and the resulting 
ORF and the encoded S. pneumoniae polypeptide is within the scope of this invention. 
For example, within the ORFs a codon such as AUG or GUG (encoding methionine or 

25 valine) which is part of the initiation signal for protein synthesis can be identified and the 
portion of an ORF to corresponding to a naturally-occurring S. pneumoniae polypeptide 
can be recognized. The predicted coding regions were defined by evaluating the coding 
potential of such sequences with the program GENEMARK™ (Borodovsky and 
Mclninch, 1993, Comp. . 17:123). 

30 Each predicted ORF amino acid sequence was compared with all sequences 

found in current GENBANK, SWISS-PROT, and PIR databases using the BLAST 
algorithm. BLAST identifies local alignments occurring by chance between the ORF 
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sequence and the sequence in the databank (Altschal et al., 1990, L Mol. Biol. 215:403- 
4 1 0). Homologous ORFs (probabilities less than 1 0" 5 by chance) andORF's that are 
probably non-homologous (probabilities greater than 10" 5 by chance) but have good 
codon usage were identified. Both homologous, sequences and non-homologous 
5 sequences with good codon usage, are likely to encode proteins and are encompassed by 
the invention. 

S. pneumoniae Nucleic Acids 

The nucleic acids of this invention may be obtained directly from the DNA of the 

10 above referenced 5. pneumoniae strain by using the polymerase chain reaction (PCR). 
See "PCR, A Practical Approach" (McPherson, Quirke, and Taylor, eds., IRL Press, 
Oxford, UK, 1991) for details about the PCR. High fidelity PCR can be used to ensure a 
faithful DNA copy prior to expression. In addition, the authenticity of amplified 
products can be verified by conventional sequencing methods. Clones carrying the 

15 desired sequences described in this invention may also be obtained by screening the 

libraries by means of the PCR or by hybridization of synthetic oligonucleotide probes to 
filter lifts of the library colonies or plaques as known in the art (see, e.g., Sambrook et al., 
Molecular Cloning, A Laboratory Manual 2nd edition, 1989, Cold Spring Harbor Press, 
NY). 

20 It is also possible to obtain nucleic acids encoding S. pneumoniae polypeptides 

from a cDNA library in accordance with protocols herein described. A cDNA encoding 
an S. pneumoniae polypeptide can be obtained by isolating total mRNA from an 
appropriate strain. Double stranded cDNAs can then be prepared from the total mRNA. 
Subsequently, the cDNAs can be inserted into a suitable plasmid or viral (e.g., 

25 bacteriophage) vector using any one of a number of known techniques. Genes encoding 
S. pneumoniae polypeptides can also be cloned using established polymerase chain 
reaction techniques in accordance with the nucleotide sequence information provided by 
the invention. The nucleic acids of the invention can be DNA or RNA. Preferred nucleic 
acids of the invention are contained in the Sequence Listing. 

30 The nucleic acids of the invention can also be chemically synthesized using 

standard techniques. Various methods of chemically synthesizing polydeoxynucleotides 
are known, including solid-phase synthesis which, like peptide synthesis, has been fully 
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automated in commercially available DNA synthesizers (See e.g., Itakura et aL U.S. 
Patent No. 4,598,049; Caruthers et aL U.S. Patent No. 4,458,066; and Itakura U.S. Patent 
Nos. 4,401,796 and 4,373,071, incorporated by reference herein). 

Nucleic acids isolated or synthesized in accordance with features of the present 
5 invention are useful, by way of example, without limitation, as probes, primers, capture 
ligands, antisense genes and for developing expression systems for the synthesis of 
proteins and peptides corresponding to such sequences. As probes, primers, capture 
ligands and antisense agents, the nucleic acid normally consists of all or part 
(approximately twenty or more nucleotides for specificity as well as the ability to form 
10 stable hybridization products) of the nucleic acids of the invention contained in the 
Sequence Listing. These uses are described in further detail below. 

Probes 

A nucleic acid isolated or synthesized in accordance with the sequence of the 
invention contained in the Sequence Listing can be used as a probe to specifically detect 

15 S. pneumoniae. With the sequence information set forth in the present application, 
sequences of twenty or more nucleotides are identified which provide the desired 
inclusivity and exclusivity with respect to S. pneumoniae, and extraneous nucleic acids 
likely to be encountered during hybridization conditions. More preferably, the sequence 
will comprise at least twenty to thirty nucleotides to convey stability to the hybridization 

20 product formed between the probe and the intended target molecules. 

Sequences larger than 1000 nucleotides in length are difficult to synthesize but 
can be generated by recombinant DNA techniques. Individuals skilled in the art will 
readily recognize that the nucleic acids, for use as probes, can be provided with a label to 
facilitate detection of a hybridization product. 

25 Nucleic acid isolated and synthesized in accordance with the sequence of the 

invention contained in the Sequence Listing can also be useful as probes to detect 
homologous regions (especially homologous genes) of other Streptococcus species using 
appropriate stringency hybridization conditions as described herein. 
Capture Ligand 

30 For use as a capture ligand, the nucleic acid selected in the manner described 

above with respect to probes, can be readily associated with a support. The manner in 
which nucleic acid is associated with supports is well known. Nucleic acid having 
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twenty or more nucleotides in a sequence of the invention contained in the Sequence 
Listing have utility to separate S. pneumoniae nucleic acid from the nucleic acid of each 
other and other organisms. Nucleic acid having twenty or more nucleotides in a 
sequence of the invention contained in the Sequence Listing can also have utility to 

5 separate other Streptococcus species from each other and from other organisms. 

Preferably, the sequence will comprise at least twenty nucleotides to convey stability to 
the hybridization product formed between the probe and the intended target molecules. 
Sequences larger than 1000 nucleotides in length are difficult to synthesize but can be 
generated by recombinant DNA techniques. 

10 Primers 

Nucleic acid isolated or synthesized in accordance with the sequences described 
herein have utility as primers for the amplification of S. pneumoniae nucleic acid. These 
nucleic acids may also have utility as primers for the amplification of nucleic acids in 
other Streptococcus species. With respect to polymerase chain reaction (PCR) 

15 techniques, nucleic acid sequences of > 10-15 nucleotides of the invention contained in 
the Sequence Listing have utility in conjunction with suitable enzymes and reagents to 
create copies of S. pneumoniae nucleic acid. More preferably, the sequence will 
comprise twenty or more nucleotides to convey stability to the hybridization product 
formed between the primer and the intended target molecules. Binding conditions of 

20 primers greater than 100 nucleotides are more difficult to control to obtain specificity. 
High fidelity PCR can be used to ensure a faithful DNA copy prior to expression. In 
addition, amplified products can be checked by conventional sequencing methods. 

The copies can be used in diagnostic assays to detect specific sequences, 
including genes from S. pneumoniae and/or other Streptococcus species. The copies can 

25 also be incorporated into cloning and expression vectors to generate polypeptides 

corresponding to the nucleic acid synthesized by PCR, as is described in greater detail 
herein. 

Antisense 

Nucleic acid or nucleic acid-hybridizing derivatives isolated or synthesized in 
30 accordance with the sequences described herein have utility as antisense agents to 
prevent the expression of 5. pneumoniae genes. These sequences also have utility as 
antisense agents to prevent expression of genes of other Streptococcus species. 
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In one embodiment, nucleic acid or derivatives corresponding to S. pneumoniae 
nucleic acids is loaded into a suitable carrier such as a liposome or bacteriophage for 
introduction into bacterial cells. For example, a nucleic acid having twenty or more 
nucleotides is capable of binding to bacteria nucleic acid or bacteria messenger RNA. 

5 Preferably, the antisense nucleic acid is comprised of 20 or more nucleotides to provide 
necessary stability of a hybridization product of non-naturally occurring nucleic acid and 
bacterial nucleic acid and/or bacterial messenger RNA. Nucleic acid having a sequence 
greater than 1000 nucleotides in length is difficult to synthesize but can be generated by 
recombinant DNA techniques. Methods for loading antisense nucleic acid in liposomes 

10 is known in the art as exemplified by U.S. Patent 4,241,046 issued December 23, 1980 to 
Papahadjopoulos et al. 

The present invention encompasses isolated polypeptides and nucleic acids 
derived from S. pneumoniae that are useful as reagents for diagnosis of bacterial 
infection, components of effective antibacterial vaccines, and/or as targets for 

15 antibacterial drugs, including anti-5. pneumoniae drugs. 

Expression of S. pneumoniae Nucleic Acids 

Table 2 provides a list of open reading frames (ORFs) in both strands. An ORF is 
a region of nucleic acid which encodes a polypeptide. This region may represent a 

20 portion of a coding sequence or a total sequence and was determined from stop to stop 
codons. The first column lists the ORF designation. The second and third columns list 
the SEQ ID numbers for the nucleic acid and amino acid sequences corresponding to 
each ORF, respectively. The fourth and fifth columns list the length of the nucleic acid 
ORF and the length of the amino acid ORF, respectively. The nucleotide sequence 

25 corresponding to each ORF begins at the first nucleotide immediately following a stop 
codon and ends at the nucleotide immediately preceding the next downstream stop codon 
in the same reading frame. It will be recognized by one skilled in the art that the natural 
translation initiation sites will correspond to ATG, GTG, or TTG codons located within 
the ORFs. The natural initiation sites depend not only on the sequence of a start codon 

30 but also on the context of the DNA sequence adjacent to the start codon. Usually, a 
recognizable ribosome binding site is found within 20 nucleotides upstream from the 
initiation codon. In some cases where genes are translationally coupled and coordinately 
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expressed together in "operons", ribosome binding sites are not present, but the initiation 
codon of a downstream gene may occur very close to, or overlap, the stop codon of the an 
upstream gene in the same operon. The correct start codons can be generally identified 
without undue experimentation because only a few codons need be tested. It is 
5 recognized that the translational machinery in bacteria initiates all polypeptide chains 
with the amino acid methionine, regardless of the sequence of the start codon. In some 
cases, polypeptides are post-translationally modified, resulting in an N-terminal amino 
acid other than methionine in vivo. The sixth and seventh columns provide metrics for 
assessing the likelihood of the homology match (determined by the BLASTP2 

10 algorithm), as is known in the art, to the genes indicated in the eighth column. 

Specifically, the sixth column represents the "score" for the match (a higher score is a 
better match), and the seventh column represents the "P-value" for the match (the 
probability that such a match could have occurred by chance; the lower the value, the 
more likely the match is valid). If a BLASTP2 score of less than 46 was obtained, no 

15 value is reported in the table the "P-value". The eighth column provides, where 

available, the accession number (AC) or the Swissprot accession number (SP), the locus 
name (LN), Superfamily Classification (CL), the Organism (OR), Source of variant 
(SR), E.C. number (EC), the gene name (GN), the product name (PN), the Function 
Description (FN), the Map Position (MP), Left End (LE), Right End (RE), Coding 

20 Direction (DI), the Database from which the sequence originates (DB), and the 

description (DE) or notes (NT) for each ORF. This information allows one of ordinary 
skill in the art to determine a potential use for each identified coding sequence and, as a 
result, allows to use the polypeptides of the present invention for commercial and 
industrial purposes . 

25 

Using the information provided in SEQ ID NO: 1 - SEQ ID NO: 2603 and in 
Table 2 together with routine cloning and sequencing methods, one of ordinary skill in 
the art will be able to clone and sequence all the nucleic acid fragments of interest 
including open reading frames (ORFs) encoding a large variety proteins of & 
30 pneumoniae. 

Nucleic acid isolated or synthesized in accordance with the sequences described 
herein have utility to generate polypeptides. The nucleic acid of the invention 
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exemplified in SEQ ID NO: 1 - SEQ ID NO: 2603 and in Table 2 or fragments of said 
nucleic acid encoding active portions of 5. Pneumoniae polypeptides can be cloned into 
suitable vectors or used to isolate nucleic acid. The isolated nucleic acid is combined 
with suitable DNA linkers and cloned into a suitable vector. 
5 The function of a specific gene or operon can be ascertained by expression in a 

bacterial strain under conditions where the activity of the gene product(s) specified by the 
gene or operon in question can be specifically measured. Alternatively, a gene product 
may be produced in large quantities in an expressing strain for use as an antigen, an 
industrial reagent, for structural studies, etc. This expression can be accomplished in a 

10 mutant strain which lacks the activity of the gene to be tested, or in a strain that does not 
produce the same gene product(s). This includes, but is not limited to, Eucaryotic species 
such as the yeast Saccharomyces cerevisiae, Methanobacterium strains or other Archaea, 
and Eubacteria such as E. coli, B. Subtilis, S. Aureus, S. Pneumonia or Pseudomonas 
putida. In some cases the expression host will utilize the natural S. pneumoniae 

15 promoter whereas in others, it will be necessary to drive the gene with a promoter 
sequence derived from the expressing organism (e.g., an E. coli beta-galactosidase 
promoter for expression in E. coli). 

To express a gene product using the natural S. pneumoniae promoter, a procedure 
such as the following can be used. A restriction fragment containing the gene of interest, 

20 together with its associated natural promoter element and regulatory sequences 

(identified using the DNA sequence data) is cloned into an appropriate recombinant 
plasmid containing an origin of replication that functions in the host organism and an 
appropriate selectable marker. This can be accomplished by a number of procedures 
known to those skilled in the art. It is most preferably done by cutting the plasmid and 

25 the fragment to be cloned with the same restriction enzyme to produce compatible ends 
that can be ligated to join the two pieces together. The recombinant plasmid is 
introduced into the host organism by, for example, electroporation and cells containing 
the recombinant plasmid are identified by selection for the marker on the plasmid. 
Expression of the desired gene product is detected using an assay specific for that gene 

30 product. 

In the case of a gene that requires a different promoter, the body of the gene 
(coding sequence) is specifically excised and cloned into an appropriate expression 
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plasmid. This subcloning can be done by several methods, but is most easily 
accomplished by PCR amplification of a specific fragment and ligation into an 
expression plasmid after treating the PCR product with a restriction enzyme or 
exonuclease to create suitable ends for cloning. 
5 A suitable host cell for expression of a gene can be any procaryotic or eucaryotic 

cell. For example, an S. pneumoniae polypeptide can be expressed in bacterial cells such 
as E. coli or B. subtilis, insect cells (baculovirus), yeast, or mammalian cells such as 
Chinese hamster ovary cell (CHO). Other suitable host cells are known to those skilled 
in the art. 

10 Expression in eucaryotic cells such as mammalian, yeast, or insect cells can lead 

to partial or complete glycosylation and/or formation of relevant inter- or intra-chain 
disulfide bonds of a recombinant peptide product. Examples of vectors for expression in 
yeast S. cerivisae include pYepSecl (Baldari. et al., (1987) Embo 1 6:229-234), pMFa 
(Kurjan and Herskowitz, (1982) Cell 30:933-943), pJRY88 (Schultz et al., (1987) Gene 

15 54:1 13-123), and pYES2 (Invitrogen Corporation, San Diego, CA). Baculovirus vectors 
available for expression of proteins in cultured insect cells (SF 9 cells) include the pAc 
series (Smith et al., (1983) Mol Cell Biol 3:2156-2165) and the pVL series (Lucklow, 
V.A., and Summers, M.D., (1989) Virolozv 170:31-39). Generally, COS cells (Gluzman, 
Y., (1981) Cell 23:175-182) are used in conjunction with such vectors as pCDM 8 

20 (Aruffo, A. and Seed, B., (1987) Proa Natl Acad Sci. USA 84:8573-8577) for transient 
amplification/expression in mammalian cells, while CHO (dhfr Chinese Hamster Ovary) 
cells are used with vectors such as pMT2PC (Kaufman et al. (1987), EMBO J. 6:187- 
195) for stable amplification/expression in mammalian cells. Vector DNA can be 
introduced into mammalian cells via conventional techniques such as calcium phosphate 

25 or calcium chloride co-precipitation, DEAE-dextran-mediated transfection, or 

electroporation. Suitable methods for transforming host cells can be found in Sambrook 
et al. ( Molecular Cloning: A Laboratory Manual, 2nd Edition, Cold Spring Harbor 
Laboratory press (1989)), and other laboratory textbooks. 

Expression in procaryotes is most often carried out in E. coli with either fusion or 

30 non-fusion inducible expression vectors. Fusion vectors usually add a number of NH2 
terminal amino acids to the expressed target gene. These NH2 terminal amino acids 
often are referred to as a reporter group or an affinity purification group. Such reporter 
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groups usually serve two purposes: 1) to increase the solubility of the target recombinant 
protein; and 2) to aid in the purification of the target recombinant protein by acting as a 
ligand in affinity purification. Often, in ftision expression vectors, a proteolytic cleavage 
site is introduced at the junction of the reporter group and the target recombinant protein 
5 to enable separation of the target recombinant protein from the reporter group subsequent 
to purification of the fusion protein. Such enzymes, and their cognate recognition 
sequences, include Factor Xa, thrombin and enterokinase. Typical fusion expression 
vectors include pGEX (Amrad Corp., Melbourne, Australia), pMAL (New England 
Biolabs, Beverly, MA) and pRIT5 (Pharmacia, Piscataway, NJ) which fuse glutathione S- 

10 transferase, maltose E binding protein, or protein A, respectively, to the target 

recombinant protein. A preferred reporter group is poly(His), which may be fused to the 
amino or carboxy terminus of the protein and which renders the recombinant fusion 
protein easily purifiable by metal chelate chromatography. 

Inducible non-fusion expression vectors include pTrc (Amann et al., (1988) Gene 

15 69:301-315) and pETlld (Studieret al.. Gene Expression Technology: Methods in 
Enzymology 185 , Academic Press, San Diego, California (1990) 60-89). While target 
gene expression relies on host RNA polymerase transcription from the hybrid trp-lac 
fusion promoter in pTrc, expression of target genes inserted into pETl Id relies on 
transcription from the T7 gnlO-lac 0 fusion promoter mediated by coexpressed viral 

20 RNA polymerase (T7 gnl). This viral polymerase is supplied by host strains BL21(DE3) 
or HMS174(DE3) from a resident X prophage harboring a T7 gnl under the 
transcriptional control of the lacUV 5 promoter. 

For example, a host cell transfected with a nucleic acid vector directing 
expression of a nucleotide sequence encoding an S. pneumoniae polypeptide can be 

25 cultured under appropriate conditions to allow expression of the polypeptide to occur. 
The polypeptide may be secreted and isolated from a mixture of cells and medium 
containing the peptide. Alternatively, the polypeptide may be retained cytoplasmically 
and the cells harvested, lysed and the protein isolated. A cell culture includes host cells, 
media and other byproducts. Suitable media for cell culture are well known in the art. 

30 Polypeptides of the invention can be isolated from cell culture medium, host cells, or 
both using techniques known in the art for purifying proteins including ion-exchange 
chromatography, gel filtration chromatography, ultrafiltration, electrophoresis, and 
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immunoaffinity purification with antibodies specific for such polypeptides. Additionally, 
in many situations, polypeptides can be produced by chemical cleavage of a native 
protein (e.g., tryptic digestion) and the cleavage products can then be purified by standard 
techniques. 

5 In the case of membrane bound proteins, these can be isolated from a host cell by 

contacting a membrane-associated protein fraction with a detergent forming a solubilized 
complex, where the membrane-associated protein is no longer entirely embedded in the 
membrane fraction and is solubilized at least to an extent which allows it to be 
chromatographically isolated from the membrane fraction. Several different criteria are 

10 used for choosing a detergent suitable for solubilizing these complexes. For example, 
one property considered is the ability of the detergent to solubilize the S. pneumoniae 
protein within the membrane fraction at minimal denaturation of the membrane- 
associated protein allowing for the activity or functionality of the membrane-associated 
protein to return upon reconstitution of the protein. Another property considered when 

15 selecting the detergent is the critical micelle concentration (CMC) of the detergent in that 
the detergent of choice preferably has a high CMC value allowing for ease of removal 
after reconstitution. A third property considered when selecting a detergent is the 
hydrophobicity of the detergent. Typically, membrane-associated proteins are very 
hydrophobic and therefore detergents which are also hydrophobic, e.g., the triton series, 

20 would be useful for solubilizing the hydrophobic proteins. Another property important to 
a detergent can be the capability of the detergent to remove the S. pneumoniae protein 
with minimal protein-protein interaction facilitating further purification. A fifth property 
of the detergent which should be considered is the charge of the detergent. For example, 
if it is desired to use ion exchange resins in the purification process then preferably 

25 detergent should be an uncharged detergent. Chromatographic techniques which can be 
used in the final purification step are known in the art and include hydrophobic 
interaction, lectin affinity, ion exchange, dye affinity and immunoaffinity. 

One strategy to maximize recombinant S. pneumoniae peptide expression in E. 
coli is to express the protein in a host bacteria with an impaired capacity to 

30 proteolytically cleave the recombinant protein (Gottesman, S., Gene Expression 
Technology: Methods in Enzymology 185, Academic Press, San Diego, California 
(1990) 1 19-128). Another strategy would be to alter the nucleic acid encoding an S. 
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pneumoniae peptide to be inserted into an expression vector so that the individual codons 
for each amino acid would be those preferentially utilized in highly expressed E. coli 
proteins (Wada et al., (1992) Nuc. Acids Res. 20:21 11-2118). Such alteration of nucleic 
acids of the invention can be carried out by standard DNA synthesis techniques. 

5 The nucleic acids of the invention can also be chemically synthesized using 

standard techniques. Various methods of chemically synthesizing polydeoxynucleotides 
are known, including solid-phase synthesis which, like peptide synthesis, has been fully 
automated in commercially available DNA synthesizers (See, e.g., Itakura et al. U.S. 
Patent No. 4,598,049; Caruthers et al. U.S. Patent No. 4,458,066; and Itakura U.S. Patent 

10 Nos. 4,401,796 and 4,373,071, incorporated by reference herein). 

The present invention provides a library of S. pneumoniae-devived nucleic acid 
sequences. The libraries provide probes, primers, and markers which can be used as 
markers in epidemiological studies. The present invention also provides a library of S. 
pneumoniae-der'wcd nucleic acid sequences which comprise or encode targets for 

15 therapeutic drugs. 

Nucleic acids comprising any of the sequences disclosed herein or sub-sequences 
thereof can be prepared by standard methods using the nucleic acid sequence information 
provided in SEQ ID NO: 1 - SEQ ID NO: 2603 . For example, DNA can be chemically 
synthesized using, e.g., the phosphoramidite solid support method of Matteucci et al, 

20 1981, J. Am. Chem. Soc. 103:3185, the method of Yoo et al, 1989, J. Biol Chem. 

764: 1 7078, or other well known methods. This can be done by sequentially linking a 
series of oligonucleotide cassettes comprising pairs of synthetic oligonucleotides, as 
described below. 

Of course, due to the degeneracy of the genetic code, many different nucleotide 
25 sequences can encode polypeptides having the amino acid sequences defined by SEQ ID 
NO: 2604 - SEQ ID NO: 5206 or sub-sequences thereof. The codons can be selected for 
optimal expression in prokaryotic or eukaryotic systems. Such degenerate variants are 
also encompassed by this invention. 

Insertion of nucleic acids (typically DNAs) encoding the polypeptides of the 
30 invention into a vector is easily accomplished when the termini of both the DNAs and the 
vector comprise compatible restriction sites. If this cannot be done, it may be necessary 
to modify the termini of the DNAs and/or vector by digesting back single-stranded DNA 
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overhangs generated by restriction endonuclease cleavage to produce blunt ends, or to 
achieve the same result by filling in the single-stranded termini with an appropriate DNA 
polymerase. 

Alternatively, any site desired may be produced, e.g., by ligating nucleotide 
5 sequences (linkers) onto the termini. Such linkers may comprise specific oligonucleotide 
sequences that define desired restriction sites. Restriction sites can also be generated by 
the use of the polymerase chain reaction (PCR). See, e.g., Saiki et al, 1988, Science 
239:48. The cleaved vector and the DNA fragments may also be modified if required by 
homopolymeric tailing. 

10 In certain embodiments, the invention encompasses isolated nucleic acid 

fragments comprising all or part of the individual nucleic acid sequences disclosed 
herein. The fragments are at least about 8 nucleotides in length, preferably at least about 
12 nucleotides in length, and most preferably at least about 15-20 nucleotides in length. 
The nucleic acids may be isolated directly from cells. Alternatively, the 

15 polymerase chain reaction (PCR) method can be used to produce the nucleic acids of the 
invention, using either chemically synthesized strands or genomic material as templates. 
Primers used for PCR can be synthesized using the sequence information provided herein 
and can further be designed to introduce appropriate new restriction sites, if desirable, to 
facilitate incorporation into a given vector for recombinant expression. 

20 The nucleic acids of the present invention may be flanked by natural S. 

pneumoniae regulatory sequences, or may be associated with heterologous sequences, 
including promoters, enhancers, response elements, signal sequences, polyadenylation 
sequences, introns, 5 - and 3'- noncoding regions, and the like. The nucleic acids may 
also be modified by many means known in the art. Non-limiting examples of such 

25 modifications include methylation, "caps", substitution of one or more of the naturally 
occurring nucleotides with an analog, internucleotide modifications such as, for example, 
those with uncharged linkages (e.g., methyl phosphonates, phosphotriesters, 
phosphoroamidates, carbamates, etc.) and with charged linkages (e.g., phosphorothioates, 
phosphorodithioates, etc.). Nucleic acids may contain one or more additional covalently 

30 linked moieties, such as, for example, proteins (e.g., nucleases, toxins, antibodies, signal 
peptides, poly-L-lysine, etc.), intercalators (e.g., acridine, psoralen, etc.), chelators (e.g., 
metals, radioactive metals, iron, oxidative metals, etc.), and alkylators. PNAs are also 
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included. The nucleic acid may be derivatized by formation of a methyl or ethyl 
phosphotriester or an alkyl phosphoramidate linkage. Furthermore, the nucleic acid 
sequences of the present invention may also be modified with a label capable of 
providing a detectable signal, either directly or indirectly. Exemplary labels include 

5 radioisotopes, fluorescent molecules, biotin, and the like. 

The invention also provides nucleic acid vectors comprising the disclosed S. 
pneumoniae-deri\ed sequences or derivatives or fragments thereof. A large number of 
vectors, including plasmid and fungal vectors, have been described for replication and/or 
expression in a variety of eukaryotic and prokaryotic hosts, and may be used for gene 

1 0 therapy as well as for simple cloning or protein expression. 

The encoded S. pneumoniae polypeptides may be expressed by using many 
known vectors, such as pUC plasmids, pET plasmids (Novagen, Inc., Madison, WI), or 
pRSET or pREP (Invitrogen, San Diego, CA), and many appropriate host cells, using 
methods disclosed or cited herein or otherwise known to those skilled in the relevant art. 

15 The particular choice of vector/host is not critical to the practice of the invention. 

Recombinant cloning vectors will often include one or more replication systems 
for cloning or expression, one or more markers for selection in the host, e.g. antibiotic 
resistance, and one or more expression cassettes. The inserted S. pneumoniae coding 
sequences may be synthesized by standard methods, isolated from natural sources, or 

20 prepared as hybrids, etc. Ligation of the & pneumoniae coding sequences to 

transcriptional regulatory elements and/or to other amino acid coding sequences may be 
achieved by known methods. Suitable host cells may be transformed/transfected/infected 
as appropriate by any suitable method including electroporation, CaCb mediated DNA 
uptake, fungal infection, microinjection, microprojectile, or other established methods. 

25 Appropriate host cells include bacteria, archebacteria, fungi, especially yeast, and 

plant and animal cells, especially mammalian cells. Of particular interest are S. 
pneumoniae, E. coli, B. Subtilis, Saccharomyces cerevisiae, Saccharomyces 
carlsbergensis, Schizosaccharomyces pombi, SF9 cells, CI 29 cells, 293 cells, 
Neurospora, and CHO cells, COS cells, HeLa cells, and immortalized mammalian 

30 myeloid and lymphoid cell lines. Preferred replication systems include Ml 3, ColEl, 
SV40, baculovirus, lambda, adenovirus, and the like. A large number of transcription 
initiation and termination regulatory regions have been isolated and shown to be effective 
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in the transcription and translation of heterologous proteins in the various hosts. 
Examples of these regions, methods of isolation, manner of manipulation, etc. are known 
in the art. Under appropriate expression conditions, host cells can be used as a source of 
recombinantly produced 5. pneumoniae-deriwd peptides and polypeptides. 
5 Advantageously, vectors may also include a transcription regulatory element (i.e., 

a promoter) operably linked to the S. pneumoniae portion. The promoter may optionally 
contain operator portions and/or ribosome binding sites. Non-limiting examples of 
bacterial promoters compatible with E. coli include: b-lactamase (penicillinase) 
promoter; lactose promoter; tryptophan (trp) promoter; araBAD (arabinose) operon 

10 promoter; lambda-derived Pi promoter and N gene ribosome binding site; and the hybrid 
tac promoter derived from sequences of the trp and lac UV5 promoters. Non-limiting 
examples of yeast promoters include 3-phosphoglycerate kinase promoter, 
glyceraldehyde-3-phosphate dehydrogenase (GAPDH) promoter, galactokinase (GAL1) 
promoter, galactoepimerase promoter, and alcohol dehydrogenase (ADH) promoter. 

15 Suitable promoters for mammalian cells include without limitation viral promoters such 
as that from Simian Virus 40 (SV40), Rous sarcoma virus (RSV), adenovirus (ADV), 
and bovine papilloma virus (BPV). Mammalian cells may also require terminator 
sequences, polyA addition sequences and enhancer sequences to increase expression. 
Sequences which cause amplification of the gene may also be desirable. Furthermore, 

20 sequences that facilitate secretion of the recombinant product from cells, including, but 
not limited to, bacteria, yeast, and animal cells, such as secretory signal sequences and/or 
prohormone pro region sequences, may also be included. These sequences are well 
described in the art. 

Nucleic acids encoding wild-type or variant S. pneumoniae-derived polypeptides 
25 may also be introduced into cells by recombination events. For example, such a 

sequence can be introduced into a cell, and thereby effect homologous recombination at 
the site of an endogenous gene or a sequence with substantial identity to the gene. Other 
recombination-based methods such as nonhomologous recombinations or deletion of 
endogenous genes by homologous recombination may also be used. 
30 The nucleic acids of the present invention find use as templates for the 

recombinant production of S. pneumoniae-derived peptides or polypeptides. 
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Identification and Use of S. pneumoniae Nucleic Acid Sequences 

The disclosed S. pneumoniae polypeptide and nucleic acid sequences, or other 
sequences that are contained within ORFs, including complete protein-coding sequences, 
of which any of the disclosed S. pneumoniae-specific sequences forms a part, are useful 
5 as target components for diagnosis and/or treatment of S. pneumoniae-caused infection 
It will be understood that the sequence of an entire protein-coding sequence of 
which each disclosed nucleic acid sequence forms a part can be isolated and identified 
based on each disclosed sequence. This can be achieved, for example, by using an 
isolated nucleic acid encoding the disclosed sequence, or fragments thereof, to prime a 

10 sequencing reaction with genomic S. pneumoniae DNA as template; this is followed by 
sequencing the amplified product. The isolated nucleic acid encoding the disclosed 
sequence, or fragments thereof, can also be hybridized to S. pneumoniae genomic 
libraries to identify clones containing additional complete segments of the protein-coding 
sequence of which the shorter sequence forms a part. Then, the entire protein-coding 

15 sequence, or fragments thereof, or nucleic acids encoding all or part of the sequence, or 
sequence-conservative or function-conservative variants thereof, may be employed in 
practicing the present invention. 

Preferred sequences are those that are useful in diagnostic and/or therapeutic 
applications. Diagnostic applications include without limitation nucleic-acid-based and 

20 antibody-baised methods for detecting bacterial infection. Therapeutic applications 
include without limitation vaccines, passive immunotherapy, and drug treatments 
directed against gene products that are both unique to bacteria and essential for growth 
and/or replication of bacteria. 

25 Identification of Nucleic Acids Encoding Vaccine Components and Targets for Agents 
Effective Against S. pneumoniae 

The disclosed S. pneumoniae genome sequence includes segments that direct the 
synthesis of ribonucleic acids and polypeptides, as well as origins of replication, 
promoters, other types of regulatory sequences, and intergenic nucleic acids. The 

30 invention encompasses nucleic acids encoding immunogenic components of vaccines and 
targets for agents effective against S. pneumoniae. Identification of said immunogenic 
components involved in the determination of the function of the disclosed sequences, 
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which can be achieved using a variety of approaches. Non-limiting examples of these 
approaches are described briefly below. 
Homology to known sequences: 

Computer-assisted comparison of the disclosed S. pneumoniae sequences with 

5 previously reported sequences present in publicly available databases is useful for 

identifying functional S. pneumoniae nucleic acid and polypeptide sequences. It will be 
understood that protein-coding sequences, for example, may be compared as a whole, 
and that a high degree of sequence homology between two proteins (such as, for 
example, >80-90%) at the amino acid level indicates that the two proteins also possess 

10 some degree of functional homology, such as, for example, among enzymes involved in 
metabolism, DNA synthesis, or cell wall synthesis, and proteins involved in transport, 
cell division, etc. In addition, many structural features of particular protein classes have 
been identified and correlate with specific consensus sequences, such as, for example, 
binding domains for nucleotides, DNA, metal ions, and other small molecules; sites for 

15 covalent modifications such as phosphorylation, acylation, and the like; sites of 

protein:protein interactions, etc. These consensus sequences may be quite short and thus 
may represent only a fraction of the entire protein-coding sequence. Identification of 
such a feature in an S. pneumoniae sequence is therefore useful in determining the 
function of the encoded protein and identifying useful targets of antibacterial drugs. 

20 Of particular relevance to the present invention are structural features that are 

common to secretory, transmembrane, and surface proteins, including secretion signal 
peptides and hydrophobic transmembrane domains. S. pneumoniae proteins identified as 
containing putative signal sequences and/or transmembrane domains are useful as 
immunogenic components of vaccines. 

25 Targets for therapeutic drugs according to the invention include, but are not 

limited to, polypeptides of the invention, whether unique to S. pneumoniae or not, that 
are essential for growth and/or viability of S. pneumoniae under at least one growth 
condition. Polypeptides essential for growth and/or viability can be determined by 
examining the effect of deleting and/or disrupting the genes, i.e., by so-called gene 

30 "knockout". Alternatively, genetic footprinting can be used (Smith et ai, 1995, Proc. 
Natl Acad. Sci. USA 92:5479-6433; Published International Application WO 94/26933; 
U.S. Patent No. 5,612,180). Still other methods for assessing essentiality includes the 
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ability to isolate conditional lethal mutations in the specific gene (e.g., temperature 
sensitive mutations). Other useful targets for therapeutic drugs, which include 
polypeptides that are not essential for growth or viability per se but lead to loss of 
viability of the cell, can be used to target therapeutic agents to cells. 

5 Strain-specific sequences: 

Because of the evolutionary relationship between different S. pneumoniae strains, 
it is believed that the presently disclosed S. pneumoniae sequences are useful for 
identifying, and/or discriminating between, previously known and new S. pneumoniae 
strains. It is believed that other S. pneumoniae strains will exhibit at least 70% sequence 

10 homology with the presently disclosed sequence. Systematic and routine analyses of 

DNA sequences derived from samples containing S. pneumoniae strains, and comparison 
with the present sequence allows for the identification of sequences that can be used to 
discriminate between strains, as well as those that are common to all S. pneumoniae 
strains. In one embodiment, the invention provides nucleic acids, including probes, and 

15 peptide and polypeptide sequences that discriminate between different strains of S. 
pneumoniae. Strain-specific components can also be identified functionally by their 
ability to elicit or react with antibodies that selectively recognize one or more S. 
pneumoniae strains. 

In another embodiment, the invention provides nucleic acids, including probes, 

20 and peptide and polypeptide sequences that are common to all S. pneumoniae strains but 
are not found in other bacterial species. 

S. pneumoniae Polypeptides 

This invention encompasses isolated 5. pneumoniae polypeptides encoded by the 

25 disclosed S. pneumoniae genomic sequences, including the polypeptides of the invention 
contained in the Sequence Listing. Polypeptides of the invention are preferably at least 5 
amino acid residues in length. Using the DNA sequence information provided herein, the 
amino acid sequences of the polypeptides encompassed by the invention can be deduced 
using methods well-known in the art. It will be understood that the sequence of an entire 

30 nucleic acid encoding an 5. pneumoniae polypeptide can be isolated and identified based 
on an ORF that encodes only a fragment of the cognate protein-coding region. This can 
be achieved, for example, by using the isolated nucleic acid encoding the ORF, or 
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fragments thereof, to prime a polymerase chain reaction with genomic S. pneumoniae 
DNA as template; this is followed by sequencing the amplified product. 

The polypeptides of the present invention, including function-conservative 
variants of the disclosed ORFs, may be isolated from wild-type or mutant S. pneumoniae 
5 cells, or from heterologous organisms or cells (including, but not limited to, bacteria, 
fungi, insect, plant, and mammalian cells) including S. pneumoniae into which a S. 
pneumoniae-dzrived protein-coding sequence has been introduced and expressed. 
Furthermore, the polypeptides may be part of recombinant fusion proteins. 

S. pneumoniae polypeptides of the invention can be chemically synthesized using 

10 commercially automated procedures such as those referenced herein , including, without 
limitation, exclusive solid phase synthesis, partial solid phase methods, fragment 
condensation or classical solution synthesis. The polypeptides are preferably prepared by 
solid phase peptide synthesis as described by Merrifield, 1963, J. Am. Chem. Soc. 
85:2149. The synthesis is carried out with amino acids that are protected at the alpha- 

15 amino terminus. Trifunctional amino acids with labile side-chains are also protected 
with suitable groups to prevent undesired chemical reactions from occurring during the 
assembly of the polypeptides. The alpha-amino protecting group is selectively removed 
to allow subsequent reaction to take place at the amino-terminus. The conditions for the 
removal of the alpha-amino protecting group do not remove the side-chain protecting 

20 groups. 

The alpha-amino protecting groups are those known to be useful in the art of 
stepwise polypeptide synthesis. Included are acyl type protecting groups, e.g., formyl, 
trifluoroacetyl, acetyl, aromatic urethane type protecting groups, e.g., benzyloxycarbonyl 
(Cbz), substituted benzyloxycarbonyl and 9-fluorenylmethyloxycarbonyl (Fmoc), 

25 aliphatic urethane protecting groups, e.g., t-butyloxycarbonyl (Boc), 

isopropyloxycarbonyl, cyclohexyloxycarbonyl, and alkyl type protecting groups, e.g., 
benzyl, triphenylmethyl. The preferred protecting group is Boc. The side-chain 
protecting groups for Tyr include tetrahydropyranyl, tert-butyl, trityl, benzyl, Cbz, 4-Br- 
Cbz and 2,6-dichlorobenzyl. The preferred side-chain protecting group for Tyr is 2,6- 

30 dichlorobenzyl. The side-chain protecting groups for Asp include benzyl, 2,6- 
dichlorobenzyl, methyl, ethyl and cyclohexyl. The preferred side-chain protecting group 
for Asp is cyclohexyl. The side-chain protecting groups for Thr and Ser include acetyl, 
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benzoyl, trityl, tetrahydropyranyl, benzyl, 2,6-dichlorobenzyl and Cbz. The preferred 
protecting group for Thr and Ser is benzyl. The side-chain protecting groups for Arg 
include nitro, Tos, Cbz, adamantyloxycarbonyl and Boc. The preferred protecting group 
for Arg is Tos. The side-chain amino group of Lys may be protected with Cbz, 2-Cl-Cbz, 
5 Tos or Boc. The 2-Cl-Cbz group is the preferred protecting group for Lys. 

The side-chain protecting groups selected must remain intact during coupling and 
not be removed during the deprotection of the amino-terminus protecting group or during 
coupling conditions. The side-chain protecting groups must also be removable upon the 
completion of synthesis, using reaction conditions that will not alter the finished 
10 polypeptide. 

Solid phase synthesis is usually carried out from the carboxy-terminus by 
coupling the alpha-amino protected (side-chain protected) amino acid to a suitable solid 
support. An ester linkage is formed when the attachment is made to a chloromethyl or 
hydroxymethyl resin, and the resulting polypeptide will have a free carboxyl group at the 

15 C-terminus. Alternatively, when a benzhydrylamine or p-methylbenzhydrylamine resin 
is used, an amide bond is formed and the resulting polypeptide will have a carboxamide 
group at the C-terminus. These resins are commercially available, and their preparation 
was described by Stewart et ah, 1984, Solid Phase Peptide Synthesis (2nd Edition), 
Pierce Chemical Co., Rockford, IL. 

20 The C-terminal amino acid, protected at the side chain if necessary and at the 

alpha-amino group, is coupled to the benzhydrylamine resin using various activating 
agents including dicyclohexylcarbodiimide (DCC), N,N f -diisopropyl-carbodiimide and 
carbonyldiimidazole. Following the attachment to the resin support, the alpha-amino 
protecting group is removed using trifluoroacetic acid (TFA) or HC1 in dioxane at a 

25 temperature between 0 and 25°C. Dimethylsulfide is added to the TFA after the 

introduction of methionine (Met) to suppress possible S-alkylation. After removal of the 
alpha-amino protecting group, the remaining protected amino acids are coupled stepwise 
in the required order to obtain the desired sequence. 

Various activating agents can be used for the coupling reactions including 

30 DCC, N,N'-diisopropyl-carbodiimide, benzotriazol-l-yl-oxy-tris-(dimethylamino)- 
phosphonium hexa-fluorophosphate (BOP) and DCC-hydroxybenzotriazole (HOBt). 
Each protected amino acid is used in excess (>2.0 equivalents), and the couplings are 
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usually carried out in N-methylpyrrolidone (NMP) or in DMF, CH2CI2 or mixtures 
thereof. The extent of completion of the coupling reaction is monitored at each stage, 
e.g., by the ninhydrin reaction as described by Kaiser et al, 1970, Anal. Biochem. 34:595. 
In cases where incomplete coupling is found, the coupling reaction is repeated. The 
5 coupling reactions can be performed automatically with commercially available 
instruments. 

After the entire assembly of the desired polypeptide, the polypeptide-resin is 
cleaved with a reagent such as liquid HF for 1-2 hours at 0°C, which cleaves the 
polypeptide from the resin and removes all side-chain protecting groups. A scavenger 

10 such as anisole is usually used with the liquid HF to prevent cations formed during the 
cleavage from alkylating the amino acid residues present in the polypeptide. The 
polypeptide-resin may be deprotected with TFA/dithioethane prior to cleavage if desired. 

Side-chain to side-chain cyclization on the solid support requires the use of an 
orthogonal protection scheme which enables selective cleavage of the side-chain 

15 functions of acidic amino acids (e.g., Asp) and the basic amino acids (e.g., Lys). The 9- 
fluorenylmethyl (Fm) protecting group for the side-chain of Asp and the 9- 
fluorenylmethyloxycarbonyl (Fmoc) protecting group for the side-chain of Lys can be 
used for this purpose. In these cases, the side-chain protecting groups of the Boc- 
protected polypeptide-resin are selectively removed with piperidine in DMF. Cyclization 

20 is achieved on the solid support using various activating agents including DCC, 

DCC/HOBt or BOP. The HF reaction is carried out on the cyclized polypeptide-resin as 
described above. 

Methods for polypeptide purification are well-known in the art, including, 
without limitation, preparative disc-gel electrophoresis, isoelectric focusing, HPLC, 

25 reversed-phase HPLC, gel filtration, ion exchange and partition chromatography, and 
countercurrent distribution. For some purposes, it is preferable to produce the 
polypeptide in a recombinant system in which the S. pneumoniae protein contains an 
additional sequence tag that facilitates purification, such as, but not limited to, a 
polyhistidine sequence. The polypeptide can then be purified from a crude lysate of the 

30 host cell by chromatography on an appropriate solid-phase matrix. Alternatively, 
antibodies produced against a S. pneumoniae protein or against peptides derived 
therefrom can be used as purification reagents. Other purification methods are possible. 
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The present invention also encompasses derivatives and homologues of S. 
pneumoniae-encodzd polypeptides. For some purposes, nucleic acid sequences encoding 
the peptides may be altered by substitutions, additions, or deletions that provide for 
functionally equivalent molecules, i.e., function-conservative variants. For example, one 
5 or more amino acid residues within the sequence can be substituted by another amino 
acid of similar properties, such as, for example, positively charged amino acids (arginine, 
lysine, and histidine); negatively charged amino acids (aspartate and glutamate); polar 
neutral amino acids; and non-polar amino acids. The isolated polypeptides may be 
modified by, for example, phosphorylation, sulfation, acylation, or other protein 
10 modifications. They may also be modified with a label capable of providing a detectable 
signal, either directly or indirectly, including, but not limited to, radioisotopes and 
fluorescent compounds, 
agents. 

To identify S. pneumoniae-dzvwzd polypeptides for use in the present invention, 

15 essentially the complete genomic sequence of a virulent, methicillin-resistant isolate of 
Streptococcus pneumoniae isolate was analyzed.. While, in very rare instances, a 
nucleic acid sequencing error may be revealed, resolving a rare sequencing error is well 
within the art, and such an occurrence will not prevent one skilled in the art from 
practicing the invention. 

20 Also encompassed are any S. pneumoniae polypeptide sequences that are 

contained within the open reading frames (ORFs), including complete protein-coding 
sequences, of which any of SEQ ID NO: 2604 - SEQ ID NO: 5206 forms a part. Table 2, 
which is appended herewith and which forms part of the present specification, provides a 
putative identification of the particular function of a polypeptide which is encoded by 

25 each ORF. As a result, one skilled in the art can use the polypeptides of the present 
invention for commercial and industrial purposes consistent with the type of putative 
identification of the polypeptide. 

The present invention provides a library of S. Pneumoniae-dzmzd polypeptide 
sequences, and a corresponding library of nucleic acid sequences encoding the 

30 polypeptides, wherein the polypeptides themselves, or polypeptides contained within 
ORFs of which they form a part, comprise sequences that are contemplated for use as 
components of vaccines. Non-limiting examples of such sequences are listed by SEQ ID 
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NO in Table 2, which is appended herewith and which forms part of the present 
specification. 

The present invention also provides a library of S. pneumoniae-defwsd 
polypeptide sequences, and a corresponding library of nucleic acid sequences encoding 
5 the polypeptides, wherein the polypeptides themselves, or polypeptides contained within 
ORFs of which they form a part, comprise sequences lacking homology to any known 
prokaryotic or eukaryotic sequences. Such libraries provide probes, primers, and markers 
which can be used to diagnose S. pneumoniae infection, including use as markers in 
epidemiological studies. Non-limiting examples of such sequences are listed by SEQ ID 
10 NO in Table 2, which is appended 

The present invention also provides a library of S. pneumoniae-derived 
polypeptide sequences, and a corresponding library of nucleic acid sequences encoding 
the polypeptides, wherein the polypeptides themselves, or polypeptides contained within 
ORFs of which they form a part, comprise targets for therapeutic drugs. 

15 

Specific Example: Determination Of Candidate Protein Antigens For Antibody And 
Vaccine Development 

The selection of candidate protein antigens for vaccine development can be 
derived from the nucleic acids encoding S. pneumoniae polypeptides. First, the ORFs 

20 can be analyzed for homology to other known exported or membrane proteins and 

analyzed using the discriminant analysis described by Klein, et al. (Klein, P., Kanehsia, 
M, and DeLisi, C. (1985) Biochimica et Biophysica Acta 815, 468-476) for predicting 
exported and membrane proteins. 

Homology searches can be performed using the BLAST algorithm contained in 

25 the Wisconsin Sequence Analysis Package (Genetics Computer Group, University 

Research Park, 575 Science Drive, Madison, WI 5371 1) to compare each predicted ORF 
amino acid sequence with all sequences found in the current GenBank, SWISS-PROT 
and PIR databases. BLAST searches for local alignments between the ORF and the 
databank sequences and reports a probability score which indicates the probability of 

30 finding this sequence by chance in the database. ORF's with significant homology (e.g. 
probabilities lower than lxlO -6 that the homology is only due to random chance) to 
membrane or exported proteins represent protein antigens for vaccine development. 
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Possible functions can be provided to S. pneumoniae genes based on sequence homology 
to genes cloned in other organisms. 

Discriminant analysis (Klein, et al. supra) can be used to examine the ORF amino 
acid sequences. This algorithm uses the intrinsic information contained in the ORF 
5 amino acid sequence and compares it to information derived from the properties of 
known membrane and exported proteins. This comparison predicts which proteins will 
be exported, membrane associated or cytoplasmic. ORF amino acid sequences identified 
as exported or membrane associated by this algorithm are likely protein antigens for 
vaccine development. 

10 

Production of Fragments and Analogs of S. pneumoniae Nucleic Acids and Polypeptides 

Based on the discovery of the S. pneumoniae gene products of the invention 
provided in the Sequence Listing, one skilled in the art can alter the disclosed structure 
(of S. pneumoniae genes), e.g., by producing fragments or analogs, and test the newly 
15 produced structures for activity. Examples of techniques known to those skilled in the 
relevant art which allow the production and testing of fragments and analogs are 
discussed below. These, or analogous methods can be used to make and screen libraries 
of polypeptides, e.g., libraries of random peptides or libraries of fragments or analogs of 
cellular proteins for the ability to bind S. pneumoniae polypeptides. Such screens are 
20 useful for the identification of inhibitors of S. pneumoniae. 
Generation of Fragments 

Fragments of a protein can be produced in several ways, e.g., recombinantly, by 
proteolytic digestion, or by chemical synthesis. Internal or terminal fragments of a 
polypeptide can be generated by removing one or more nucleotides from one end (for a 

25 terminal fragment) or both ends (for an internal fragment) of a nucleic acid which 
encodes the polypeptide. Expression of the mutagenized DNA produces polypeptide 
fragments. Digestion with "end-nibbling" endonucleases can thus generate DNA's which 
encode an array of fragments. DNA's which encode fragments of a protein can also be 
generated by random shearing, restriction digestion or a combination of the above- 

30 discussed methods. 

Fragments can also be chemically synthesized using techniques known in the art 
such as conventional Merrifield solid phase f-Moc or t-Boc chemistry. For example, 
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peptides of the present invention may be arbitrarily divided into fragments of desired 
length with no overlap of the fragments, or divided into overlapping fragments of a 
desired length. 

5 Alteration of Nucleic Acids and Polypeptides: Random Methods 

Amino acid sequence variants of a protein can be prepared by random 
mutagenesis of DNA which encodes a protein or a particular domain or region of a 
protein. Useful methods include PCR mutagenesis and saturation mutagenesis. A library 
of random amino acid sequence variants can also be generated by the synthesis of a set of 
10 degenerate oligonucleotide sequences. (Methods for screening proteins in a library of 
variants are elsewhere herein). 
PCR Mutagenesis 

In PCR mutagenesis, reduced Taq polymerase fidelity is used to introduce 
random mutations into a cloned fragment of DNA (Leung et al., 1989, Technique 1:11- 
15 15). The DNA region to be mutagenized is amplified using the polymerase chain 

reaction (PCR) under conditions that reduce the fidelity of DNA synthesis by Taq DNA 

polymerase, e.g., by using a dGTP/dATP ratio of five and adding Mn^+ to the PCR 
reaction. The pool of amplified DNA fragments are inserted into appropriate cloning 
vectors to provide random mutant libraries. 

20 Saturation Mutagenesis 

Saturation mutagenesis allows for the rapid introduction of a large number of 
single base substitutions into cloned DNA fragments (Mayers et al., 1985, Science 
229:242). This technique includes generation of mutations, e.g., by chemical treatment 
or irradiation of single-stranded DNA in vitro, and synthesis of a complimentary DNA 

25 strand. The mutation frequency can be modulated by modulating the severity of the 
treatment, and essentially all possible base substitutions can be obtained. Because this 
procedure does not involve a genetic selection for mutant fragments both neutral 
substitutions, as well as those that alter function, are obtained. The distribution of point 
mutations is not biased toward conserved sequence elements. 

30 Degenerate Oligonucleotides 

A library of homologs can also be generated from a set of degenerate 
oligonucleotide sequences. Chemical synthesis of a degenerate sequences can be carried 
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out in an automatic DNA synthesizer, and the synthetic genes then ligated into an 
appropriate expression vector. The synthesis of degenerate oligonucleotides is known in 
the art (see for example, Narang, SA (1983) Tetrahedron 39:3; Itakura et al. (1981) 
Recombinant DNA, Proc 3rd Cleveland Sympos. Macromolecules, ed. AG Walton, 
5 Amsterdam: Elsevier pp273-289; Itakura et al. (1984) Annu. Rev. Biochem. 53:323; 
Itakura etal. (1984) Science 198:1056; Ike et al. (1983) Nucleic Acid Res. 11:477. Such 
techniques have been employed in the directed evolution of other proteins (see, for 
example, Scott et al. (1990) Science 249:386-390; Roberts et al. (1992) PNAS 89:2429- 
2433; Devlin et al. (1990) Science 249: 404-406; Cwirla et al. (1990) PNAS 87: 6378- 
10 6382; as well as U.S. Patents Nos. 5,223,409, 5,198,346, and 5,096,815). 

Alteration of Nucleic Acids and Polypeptides: Methods for Directed Mutagenesis 

Non-random or directed, mutagenesis techniques can be used to provide specific 
sequences or mutations in specific regions. These techniques can be used to create 

15 variants which include, e.g., deletions, insertions, or substitutions, of residues of the 
known amino acid sequence of a protein. The sites for mutation can be modified 
individually or in series, e.g., by (1) substituting first with conserved amino acids and 
then with more radical choices depending upon results achieved, (2) deleting the target 
residue, or (3) inserting residues of the same or a different class adjacent to the located 

20 site, or combinations of options 1-3. 

Alanine Scanning Mutagenesis 
Alanine scanning mutagenesis is a useful method for identification of certain 
residues or regions of the desired protein that are preferred locations or domains for 
mutagenesis, Cunningham and Wells {Science 244:1081-1085, 1989). In alanine 

25 scanning, a residue or group of target residues are identified (e.g., charged residues such 
as Arg, Asp, His, Lys, and Glu) and replaced by a neutral or negatively charged amino 
acid (most preferably alanine or polyalanine). Replacement of an amino acid can affect 
the interaction of the amino acids with the surrounding aqueous environment in or 
outside the cell. Those domains demonstrating functional sensitivity to the substitutions 

30 are then refined by introducing further or other variants at or for the sites of substitution. 
Thus, while the site for introducing an amino acid sequence variation is predetermined, 
the nature of the mutation per se need not be predetermined. For example, to optimize 
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the performance of a mutation at a given site, alanine scanning or random mutagenesis 
may be conducted at the target codon or region and the expressed desired protein subunit 
variants are screened for the optimal combination of desired activity. 
Oligonucleotide-Mediated Mutagenesis 
5 Oligonucleotide-mediated mutagenesis is a useful method for preparing 

substitution, deletion, and insertion variants of DNA, see, e.g., Adelman et al., (DNA 
2:183, 1983). Briefly, the desired DNA is altered by hybridizing an oligonucleotide 
encoding a mutation to a DNA template, where the template is the single-stranded form 
of a plasmid or bacteriophage containing the unaltered or native DNA sequence of the 

10 desired protein. After hybridization, a DNA polymerase is used to synthesize an entire 
second complementary strand of the template that will thus incorporate the 
oligonucleotide primer, and will code for the selected alteration in the desired protein 
DNA. Generally, oligonucleotides of at least 25 nucleotides in length are used. An 
optimal oligonucleotide will have 12 to 15 nucleotides that are completely 

1 5 complementary to the template on either side of the nucleotide(s) coding for the 

mutation. This ensures that the oligonucleotide will hybridize properly to the single- 
stranded DNA template molecule. The oligonucleotides are readily synthesized using 
techniques known in the art such as that described by Crea et al. (Proc. Natl. Acad. Sci. 
USA, 75:5765[1978]). 

20 Cassette Mutagenesis 

Another method for preparing variants, cassette mutagenesis, is based on the 
technique described by Wells et al. {Gene, 34:315[1985]). The starting material is a 
plasmid (or other vector) which includes the protein subunit DNA to be mutated. The 
codon(s) in the protein subunit DNA to be mutated are identified. There must be a 

25 unique restriction endonuclease site on each side of the identified mutation site(s). If no 
such restriction sites exist, they may be generated using the above-described 
oligonucleotide-mediated mutagenesis method to introduce them at appropriate locations 
in the desired protein subunit DNA. After the restriction sites have been introduced into 
the plasmid, the plasmid is cut at these sites to linearize it. A double-stranded 

30 oligonucleotide encoding the sequence of the DNA between the restriction sites but 
containing the desired mutation(s) is synthesized using standard procedures. The two 
strands are synthesized separately and then hybridized together using standard 
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techniques. This double-stranded oligonucleotide is referred to as the cassette. This 
cassette is designed to have 3' and 5' ends that are comparable with the ends of the 
linearized plasmid, such that it can be directly ligated to the plasmid. This plasmid now 
contains the mutated desired protein subunit DNA sequence. 
5 Combinatorial Mutagenesis 

Combinatorial mutagenesis can also be used to generate mutants (Ladner et al. 5 
WO 88/06630). In this method, the amino acid sequences for a group of homologs or 
other related proteins are aligned, preferably to promote the highest homology possible. 
All of the amino acids which appear at a given position of the aligned sequences can be 

10 selected to create a degenerate set of combinatorial sequences. The variegated library of 
variants is generated by combinatorial mutagenesis at the nucleic acid level, and is 
encoded by a variegated gene library. For example, a mixture of synthetic 
oligonucleotides can be enzymatically ligated into gene sequences such that the 
degenerate set of potential sequences are expressible as individual peptides, or 

1 5 alternatively, as a set of larger fusion proteins containing the set of degenerate sequences. 

Other Modifications of S. pneumoniae Nucleic Acids and Polypeptides 

It is possible to modify the structure of an S. pneumoniae polypeptide for such 

purposes as increasing solubility, enhancing stability (e.g., shelf life ex vivo and 
20 resistance to proteolytic degradation in vivo). A modified S> pneumoniae protein or 

peptide can be produced in which the amino acid sequence has been altered, such as by 

amino acid substitution, deletion, or addition as described herein. 

An S. pneumoniae peptide can also be modified by substitution of cysteine 

residues preferably with alanine, serine, threonine, leucine or glutamic acid residues to 
25 minimize dimerization via disulfide linkages. In addition, amino acid side chains of 

fragments of the protein of the invention can be chemically modified. Another 

modification is cyclization of the peptide. 

In order to enhance stability and/or reactivity, an 5". pneumoniae polypeptide can 

be modified to incorporate one or more polymorphisms in the amino acid sequence of the 
30 protein resulting from any natural allelic variation. Additionally, D-amino acids, non- 
natural amino acids, or non-amino acid analogs can be substituted or added to produce a 

modified protein within the scope of this invention. Furthermore, an S. pneumoniae 
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polypeptide can be modified using polyethylene glycol (PEG) according to the method of 
A. Sehon and co-workers (Wie et al., supra) to produce a protein conjugated with PEG. 
In addition, PEG can be added during chemical synthesis of the protein. Other 
modifications of S. pneumoniae proteins include reduction/alkylation (Tarr, Methods of 
5 Protein Microcharacterization, J. E. Silver ed., Humana Press, Clifton NJ 155-194 

(1986)); acylation (Tarr, supra); chemical coupling to an appropriate carrier (Mishell and 
Shiigi, eds, Selected Methods in Cellular Immunology, WH Freeman, San Francisco, CA 
(1980), U.S. Patent 4,939,239; or mild formalin treatment (Marsh, (1971) Int. Arch, of 
Allergy andAppl. Immunol., 4J_: 199-215). 

10 To facilitate purification and potentially increase solubility of an S. pneumoniae 

protein or peptide, it is possible to add an amino acid fusion moiety to the peptide 
backbone. For example, hexa-histidine can be added to the protein for purification by 
immobilized metal ion affinity chromatography (Hochuli, E. et al., (1988) 
Bio/Technology, 6: 1321 - 1325). In addition, to facilitate isolation of peptides free of 

15 irrelevant sequences, specific endoprotease cleavage sites can be introduced between the 
sequences of the fusion moiety and the peptide. 

To potentially aid proper antigen processing of epitopes within an S. pneumoniae 
polypeptide, canonical protease sensitive sites can be engineered between regions, each 
comprising at least one epitope via recombinant or synthetic methods. For example, 

20 charged amino acid pairs, such as KK or RR, can be introduced between regions within a 
protein or fragment during recombinant construction thereof. The resulting peptide can 
be rendered sensitive to cleavage by cathepsin and/or other trypsin-like enzymes which 
would generate portions of the protein containing one or more epitopes. In addition, such 
charged amino acid residues can result in an increase in the solubility of the peptide. 

25 

Primary Methods for Screening Polypeptides and Analogs 

Various techniques are known in the art for screening generated mutant gene 
products. Techniques for screening large gene libraries often include cloning the gene 
library into replicable expression vectors, transforming appropriate cells with the 
30 resulting library of vectors, and expressing the genes under conditions in which detection 
of a desired activity, e.g., in.this case, binding to S. pneumoniae polypeptide or an 
interacting protein, facilitates relatively easy isolation of the vector encoding the gene 
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whose product was detected. Each of the techniques described below is amenable to high 
through-put analysis for screening large numbers of sequences created, e.g., by random 
mutagenesis techniques. 

Two Hybrid Systems 

5 Two hybrid assays such as the system described above (as with the other 

screening methods described herein), can be used to identify polypeptides, e.g., fragments 
or analogs of a naturally-occurring 5. pneumoniae polypeptide, e.g., of cellular proteins, 
or of randomly generated polypeptides which bind to an S. pneumoniae protein. (The S. 
pneumoniae domain is used as the bait protein and the library of variants are expressed as 
10 prey fusion proteins.) In an analogous fashion, a two hybrid assay (as with the other 
screening methods described herein), can be used to find polypeptides which bind a S. 
pneumoniae polypeptide. 

Display Libraries 

In one approach to screening assays, the candidate peptides are displayed on the 

1 5 surface of a cell or viral particle, and the ability of particular cells or viral particles to 
bind an appropriate receptor protein via the displayed product is detected in a "panning 
assay". For example, the gene library can be cloned into the gene for a surface membrane 
protein of a bacterial cell, and the resulting fusion protein detected by panning (Ladner et 
al., WO 88/06630; Fuchs et al. (1991) Bio/Technology 9:1370-1371; and Goward et al. 

20 (1992) TIBS 18:136-140). In a similar fashion, a detectably labeled ligand can be used to 
score for potentially functional peptide homologs. Fluorescently labeled ligands, e.g., 
receptors, can be used to detect homologs which retain ligand-binding activity. The use 
of fluorescently labeled ligands, allows cells to be visually inspected and separated under 
a fluorescence microscope, or, where the morphology of the cell permits, to be separated 

25 by a fluorescence-activated cell sorter. 

A gene library can be expressed as a fusion protein on the surface of a viral 
particle. For instance, in the filamentous phage system, foreign peptide sequences can be 
expressed on the surface of infectious phage, thereby conferring two significant benefits. 
First, since these phage can be applied to affinity matrices at concentrations well over 

30 lO 1 -* phage per milliliter, a large number of phage can be screened at one time. Second, 
since each infectious phage displays a gene product on its surface, if a particular phage is 
recovered from an affinity matrix in low yield, the phage can be amplified by another 
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round of infection. The group of almost identical E. coli filamentous phages Ml 3, fd., 
and fl are most often used in phage display libraries. Either of the phage gill or gVIII 
coat proteins can be used to generate fusion proteins without disrupting the ultimate 
packaging of the viral particle. Foreign epitopes can be expressed at the NH2-terminal 
5 end of pill and phage bearing such epitopes recovered from a large excess of phage 
lacking this epitope (Ladner et al. PCT publication WO 90/02909; Garrard et al., PCT 
publication WO 92/09690; Marks et al. (1992) J. Biol Chem. 267:16007-16010; 
Griffiths et al. (1993) EMBO J 1 2:725-734; Clackson et al. (1991) Nature 352:624-628; 
and Barbas et al. (1992) PNAS 89:4457-4461). 

10 A common approach uses the maltose receptor of E. coli (the outer membrane 

protein, LamB) as a peptide fusion partner (Charbit et al. (1986) EMBO 5, 3029-3037). 
Oligonucleotides have been inserted into plasmids encoding the LamB gene to produce 
peptides fused into one of the extracellular loops of the protein. These peptides are 
available for binding to ligands, e.g., to antibodies, and can elicit an immune response 

15 when the cells are administered to animals. Other cell surface proteins, e.g., OmpA 

(Schorr et al. (1991) Vaccines 91, pp. 387-392), PhoE (Agterberg, et al. (1990) Gene 88, 
37-45), and PAL (Fuchs et al. (1991) Bio/Tech 9, 1369-1372), as well as large bacterial 
surface structures have served as vehicles for peptide display. Peptides can be fused to 
pilin, a protein which polymerizes to form the pilus-a conduit for interbacterial exchange 

20 of genetic information (Thiry et al. (1989) Appl Environ. Microbiol 55, 984-993). 

Because of its role in interacting with other cells, the pilus provides a useful support for 
the presentation of peptides to the extracellular environment. Another large surface 
structure used for peptide display is the bacterial motive organ, the flagellum. Fusion of 
peptides to the subunit protein flagellin offers a dense array of many peptide copies on 

25 the host cells (Kuwajima et al. (1988) Bio/Tech. 6, 1080-1083). Surface proteins of other 
bacterial species have also served as peptide fusion partners. Examples include the 
Staphylococcus protein A and the outer membrane IgA protease of Neisseria (Hansson et 
al. (1992) Bacteriol 174, 4239-4245 and Klauser et al. (1990) EMBO J. 9, 1991 - 
1999). 

30 In the filamentous phage systems and the LamB system described above, the 

physical link between the peptide and its encoding DNA occurs by the containment of the 
DNA within a particle (cell or phage) that carries the peptide on its surface. Capturing 



Docket No.: GTC03-02 



53 



the peptide captures the particle and the DNA within. An alternative scheme uses the 
DNA-binding protein Lad to form a link between peptide and DNA (Cull et al (1992) 
PNAS USA 89:1865-1869). This system uses a plasmid containing the Lad gene with an 
oligonucleotide cloning site at its 3'-end. Under the controlled induction by arabinose, a 
5 Lacl-peptide fusion protein is produced. This fusion retains the natural ability of Lad to 
bind to a short DNA sequence known as LacO operator (LacO). By installing two copies 
of LacO on the expression plasmid, the Lacl-peptide fusion binds tightly to the plasmid 
that encoded it. Because the plasmids in each cell contain only a single oligonucleotide 
sequence and each cell expresses only a single peptide sequence, the peptides become 

10 specifically and stablely associated with the DNA sequence that directed its synthesis. 
The cells of the library are gently lysed and the peptide-DNA complexes are exposed to a 
matrix of immobilized receptor to recover the complexes containing active peptides. The 
associated plasmid DNA is then reintroduced into cells for amplification and DNA 
sequencing to determine the identity of the peptide ligands. As a demonstration of the 

15 practical utility of the method, a large random library of dodecapeptides was made and 
selected on a monoclonal antibody raised against the opioid peptide dynorphin B. A 
cohort of peptides was recovered, all related by a consensus sequence corresponding to a 
six-residue portion of dynorphin B. (Cull et al. (1992) Proc. Natl Acad. Set U.S.A. 89- 
1869) 

20 This scheme, sometimes referred to as peptides-on-plasmids, differs in two 

important ways from the phage display methods. First, the peptides are attached to the 
C-terminus of the fusion protein, resulting in the display of the library members as 
peptides having free carboxy termini. Both of the filamentous phage coat proteins, pill 
and pVIII, are anchored to the phage through their C-termini, and the guest peptides are 

25 placed into the outward-extending N-terminal domains. In some designs, the phage- 
displayed peptides are presented right at the amino terminus of the fusion protein. 
(Cwirla, et al. (1990) Proc. Natl. Acad. Sci. U.S.A. 87, 6378-6382) A second difference 
is the set of biological biases affecting the population of peptides actually present in the 
libraries. The Lad fusion molecules are confined to the cytoplasm of the host cells. The 

30 phage coat fusions are exposed briefly to the cytoplasm during translation but are rapidly 
secreted through the inner membrane into the periplasm ic compartment, remaining 
anchored in the membrane by their C-terminal hydrophobic domains, with the N-termini, 
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containing the peptides, protruding into the periplasm while awaiting assembly into 
phage particles. The peptides in the Lad and phage libraries may differ significantly as a 
result of their exposure to different proteolytic activities. The phage coat proteins require 
transport across the inner membrane and signal peptidase processing as a prelude to 
5 incorporation into phage. Certain peptides exert a deleterious effect on these processes 
and are underrepresented in the libraries (Gallop et al. (1994) J. Med. Chem. 37(9): 1233- 
125 1). These particular biases are not a factor in the Lad display system. 

The number of small peptides available in recombinant random libraries is 
enormous. Libraries of lO^-lO^ independent clones are routinely prepared. Libraries as 

10 large as 10 1 1 recombinants have been created, but this size approaches the practical limit 
for clone libraries. This limitation in library size occurs at the step of transforming the 
DNA containing randomized segments into the host bacterial cells. To circumvent this 
limitation, an in vitro system based on the display of nascent peptides in polysome 
complexes has recently been developed. This display library method has the potential of 

1 5 producing libraries 3-6 orders of magnitude larger than the currently available 

phage/phagemid or plasmid libraries. Furthermore, the construction of the libraries, 
expression of the peptides, and screening, is done in an entirely cell-free format. 

In one application of this method (Gallop et al. (1994) J. Med. Chem. 37(9):1233- 

1251), a molecular DNA library encoding 10^2 decapeptides was constructed and the 
20 library expressed in an E. coli S30 in vitro coupled transcription/translation system. 

Conditions were chosen to stall the ribosomes on the mRNA, causing the accumulation 
of a substantial proportion of the RNA in polysomes and yielding complexes containing 
nascent peptides still linked to their encoding RNA. The polysomes are sufficiently 
robust to be affinity purified on immobilized receptors in much the same way as the more 
25 conventional recombinant peptide display libraries are screened. RNA from the bound 
complexes is recovered, converted to cDNA, and amplified by PCR to produce a 
template for the next round of synthesis and screening. The polysome display method 
can be coupled to the phage display system. Following several rounds of screening, 
cDNA from the enriched pool of polysomes was cloned into a phagemid vector. This 
30 vector serves as both a peptide expression vector, displaying peptides fused to the coat 
proteins, and as a DNA sequencing vector for peptide identification. By expressing the 
polysome-derived peptides on phage, one can either continue the affinity selection 
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procedure in this format or assay the peptides on individual clones for binding activity in 
a phage ELISA, or for binding specificity in a completion phage ELISA (Barret, et al. 
(1992) Anal. Biochem 204,357-364). To identify the sequences of the active peptides 
one sequences the DNA produced by the phagemid host. 

5 

Secondary Screening of Polypeptides and Analogs 

The high through-put assays described above can be followed by secondary 
screens in order to identify further biological activities which will, e.g., allow one skilled 
in the art to differentiate agonists from antagonists. The type of a secondary screen used 

10 will depend on the desired activity that needs to be tested. For example, an assay can be 
developed in which the ability to inhibit an interaction between a protein of interest and 
its respective ligand can be used to identify antagonists from a group of peptide 
fragments isolated though one of the primary screens described above. 

Therefore, methods for generating fragments and analogs and testing them for 

15 activity are known in the art. Once the core sequence of interest is identified, it is routine 
for one skilled in the art to obtain analogs and fragments. 

Peptide Mimetics of S. pneumoniae Polypeptides 

The invention also provides for reduction of the protein binding domains of the 

20 subject S. pneumoniae polypeptides to generate mimetics, e.g. peptide or non-peptide 
agents. The peptide mimetics are able to disrupt binding of a polypeptide to its counter 
ligand, e.g., in the case of an S. pneumoniae polypeptide binding to a naturally occurring 
ligand. The critical residues of a subject S. pneumoniae polypeptide which are involved 
in molecular recognition of a polypeptide can be determined and used to generate S. 

25 pneumoniae-derived peptidomimetics which competitively or noncompetitively inhibit 
binding of the & pneumoniae polypeptide with an interacting polypeptide (see, for 
example, European patent applications EP-412,762A and EP-B31,080A). 

For example, scanning mutagenesis can be used to map the amino acid residues 
of a particular S. pneumoniae polypeptide involved in binding an interacting polypeptide, 

30 peptidomimetic compounds (e.g. diazepine or isoquinoline derivatives) can be generated 
which mimic those residues in binding to an interacting polypeptide, and which therefore 
can inhibit binding of an S. pneumoniae polypeptide to an interacting polypeptide and 
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thereby interfere with the function of S. pneumoniae polypeptide. For instance, non- 
hydrolyzable peptide analogs of such residues can be generated using benzodiazepine 
(e.g., see Freidinger et al. in Peptides: Chemistry and Biology, G.R. Marshall ed., 
ESCOM Publisher: Leiden, Netherlands, 1988), azepine (e.g., see Huffman et al. in 
5 Peptides: Chemistry and Biology, G.R. Marshall ed., ESCOM Publisher: Leiden, 

Netherlands, 1988), substituted gama lactam rings (Garvey et al. in Peptides: Chemistry 
and Biology, G.R. Marshall ed., ESCOM Publisher: Leiden, Netherlands, 1988), keto- 
methylene pseudopeptides (Ewenson et al. (1986) J Med Chem 29:295; and Ewenson et 
al. in Peptides: Structure and Function (Proceedings of the 9th American Peptide 
10 Symposium) Pierce Chemical Co. Rockland, IL, 1985), b-turn dipeptide cores (Nagai et 
al. (1985) Tetrahedron Lett 26:647; and Sato et al. (1986) J Chem Soc Perkin Trans 
1 :1231), and b-aminoalcohols (Gordon et al. (1985) Biochem Biophys Res Commun 
126:419; and et al. (1986) Biochem Biophys Res Commun 134:71). 

15 Vaccine Formulations for S. pneumoniae Nucleic Acids and Polypeptides 

This invention also features vaccine compositions for protection against infection 
by S. pneumoniae or for treatment of S. pneumoniae infection, a gram-negative spiral 
microaerophilic bacterium. In one embodiment, the vaccine compositions contain one or 
more immunogenic components such as a surface protein from S. pneumoniae, or portion 

20 thereof, and a pharmaceutically acceptable carrier. Nucleic acids within the scope of the 
invention are exemplified by the nucleic acids of the invention contained in the Sequence 
Listing which encode S. pneumoniae surface proteins. Any nucleic acid encoding an 
immunogenic S. pneumoniae protein, or portion thereof, which is capable of expression 
in a cell, can be used in the present invention. These vaccines have therapeutic and 

25 prophylactic utilities. 

One aspect of the invention provides a vaccine composition for protection against 
infection by S. pneumoniae which contains at least one immunogenic fragment of an S. 
pneumoniae protein and a pharmaceutically acceptable carrier. Preferred fragments 
include peptides of at least about 10 amino acid residues in length, preferably about 10- 

30 20 amino acid residues in length, and more preferably about 12-16 amino acid residues in 
length. 
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Immunogenic components of the invention can be obtained, for example, by 
screening polypeptides recombinantly produced from the corresponding fragment of the 
nucleic acid encoding the full-length S. pneumoniae protein. In addition, fragments can 
be chemically synthesized using techniques known in the art such as conventional 
5 Merrifield solid phase f-Moc or t-Boc chemistry. 

In one embodiment, immunogenic components are identified by the ability of the 
peptide to stimulate T cells. Peptides which stimulate T cells, as determined by, for 
example, T cell proliferation or cytokine secretion are defined herein as comprising at 
least one T cell epitope. T cell epitopes are believed to be involved in initiation and 
1 0 perpetuation of the immune response to the protein allergen which is responsible for the 
clinical symptoms of allergy. These T cell epitopes are thought to trigger early events at 
the level of the T helper cell by binding to an appropriate HLA molecule on the surface 
of an antigen presenting cell, thereby stimulating the T cell subpopulation with the 
relevant T cell receptor for the epitope. These events lead to T cell proliferation, 
15 lymphokine secretion, local inflammatory reactions, recruitment of additional immune 
cells to the site of antigen/T cell interaction, and activation of the B cell cascade, leading 
to the production of antibodies. A T cell epitope is the basic element, or smallest unit of 
recognition by a T cell receptor, where the epitope comprises amino acids essential to 
receptor recognition (e.g., approximately 6 or 7 amino acid residues). Amino acid 
20 sequences which mimic those of the T cell epitopes are within the scope of this 
invention. 

Screening immunogenic components can be accomplished using one or more of 
several different assays. For example, in vitro, peptide T cell stimulatory activity is 
assayed by contacting a peptide known or suspected of being immunogenic with an 

25 antigen presenting cell which presents appropriate MHC molecules in a T cell culture. 
Presentation of an immunogenic S. pneumoniae peptide in association with appropriate 
MHC molecules to T cells in conjunction with the necessary co-stimulation has the effect 
of transmitting a signal to the T cell that induces the production of increased levels of 
cytokines, particularly of interleukin-2 and interleukin-4. The culture supernatant can be 

30 obtained and assayed for interleukin-2 or other known cytokines. For example, any one 
of several conventional assays for interleukin-2 can be employed, such as the assay 
described in Proc. Natl Acad. Sci USA, 86: 1333 (1989) the pertinent portions of which 
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are incorporated herein by reference. A kit for an assay for the production of interferon is 
also available from Genzyme Corporation (Cambridge, MA). 

Alternatively, a common assay for T cell proliferation entails measuring tritiated 
thymidine incorporation. The proliferation of T cells can be measured in vitro by 

5 determining the amount of ^H-labeled thymidine incorporated into the replicating DNA 
of cultured cells. Therefore, the rate of DNA synthesis and, in turn, the rate of cell 
division can be quantified. 

Vaccine compositions of the invention containing immunogenic components 
(e.g., S. pneumoniae polypeptide or fragment thereof or nucleic acid encoding an S. 

10 pneumoniae polypeptide or fragment thereof) preferably include a pharmaceutical ly 

acceptable carrier. The term "pharmaceutical ly acceptable carrier" refers to a carrier that 
does not cause an allergic reaction or other untoward effect in patients to whom it is 
administered. Suitable pharmaceutically acceptable carriers include, for example, one or 
more of water, saline, phosphate buffered saline, dextrose, glycerol, ethanol and the like, 

15 as well as combinations thereof. Pharmaceutically acceptable carriers may further 

comprise minor amounts of auxiliary substances such as wetting or emulsifying agents, 
preservatives or buffers, which enhance the shelf life or effectiveness of the antibody. 
For vaccines of the invention containing S. pneumoniae polypeptides, the polypeptide is 
co-administered with a suitable adjuvant. 

20 It will be apparent to those of skill in the art that the therapeutically effective 

amount of DNA or protein of this invention will depend, inter alia, upon the 
administration schedule, the unit dose of antibody administered, whether the protein or 
DNA is administered in combination with other therapeutic agents, the immune status 
and health of the patient, and the therapeutic activity of the particular protein or DNA. 

25 Vaccine compositions are conventionally administered parenterally, e.g., by 

injection, either subcutaneously or intramuscularly. Methods for intramuscular 
immunization are described by Wolff et al. (1990) Science 247: 1465-1468 and by 
Sedegah et al. (1994) Immunology 91: 9866-9870. Other modes of administration 
include oral and pulmonary formulations, suppositories, and transdermal applications. 

30 Oral immunization is preferred over parenteral methods for inducing protection against 
infection by S. pneumoniae. Cain et. al. (1993) Vaccine 11: 637-642. Oral formulations 
include such normally employed excipients as, for example, pharmaceutical grades of 
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mannitol, lactose, starch, magnesium stearate, sodium saccharine, cellulose, magnesium 
carbonate, and the like. 

The vaccine compositions of the invention can include an adjuvant, including, but 
not limited to aluminum hydroxide; N-acetyl-muramyl--L-threonyl-D-isoglutamine (thr- 
5 MDP); N-acetyl-nor-muramyl-L-alanyl-D-isoglutamine (CGP 1 1 637, referred to as nor- 
MDP);N-acetylmuramyl-L-alanyl-D-isoglutaminyl-L-alanine-2-(r-2 , -dipalmitoyl-sn- 
glycero-3-hydroxyphos-phoryloxy)-ethylamine (CGP 19835A, referred to a MTP-PE); 
RBI, which contains three components from bacteria; monophosphoryl lipid A; 
trehalose dimycoloate; cell wall skeleton (MPL + TDM + CWS) in a 2% squalene/Tween 

10 80 emulsion; and cholera toxin. Others which may be used are non-toxic derivatives of 
cholera toxin, including its B subunit, and/or conjugates or genetically engineered fusions 
of the S. pneumoniae polypeptide with cholera toxin or its B subunit, procholeragenoid, 
fungal polysaccharides, including schizophyllan, muramyl dipeptide, muramyl dipeptide 
derivatives, phorbol esters, labile toxin of E. coli, non-& pneumoniae bacterial lysates, 

1 5 block polymers or saponins. 

Other suitable delivery methods include biodegradable microcapsules or immuno- 
stimulating complexes (ISCOMs), cochleates, or liposomes, genetically engineered 
attenuated live vectors such as viruses or bacteria, and recombinant (chimeric) virus-like 
particles, e.g., bluetongue. The amount of adjuvant employed will depend on the type of 

20 adjuvant used. For example, when the mucosal adjuvant is cholera toxin, it is suitably 
used in an amount of 5 mg to 50 mg, for example 10 mg to 35 mg. When used in the 
form of microcapsules, the amount used will depend on the amount employed in the 
matrix of the microcapsule to achieve the desired dosage. The determination of this 
amount is within the skill of a person of ordinary skill in the art. 

25 Carrier systems in humans may include enteric release capsules protecting the 

antigen from the acidic environment of the stomach, and including S. pneumoniae 
polypeptide in an insoluble form as fusion proteins. Suitable carriers for the vaccines of 
the invention are enteric coated capsules and polylactide-glycolide microspheres. 
Suitable diluents are 0.2 N NaHC03 and/or saline. 

30 Vaccines of the invention can be administered as a primary prophylactic agent in 

adults or in children, as a secondary prevention, after successful eradication of S. 
pneumoniae in an infected host, or as a therapeutic agent in the aim to induce an immune 
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response in a susceptible host to prevent infection by S. pneumoniae. The vaccines of the 
invention are administered in amounts readily determined by persons of ordinary skill in 
the art. Thus, for adults a suitable dosage will be in the range of 10 mg to 10 g, 
preferably 10 mg to 100 mg. A suitable dosage for adults will also be in the range of 5 
5 mg to 500 mg. Similar dosage ranges will be applicable for children. Those skilled in 
the art will recognize that the optimal dose may be more or less depending upon the 
patient's body weight, disease, the route of administration, and other factors. Those 
skilled in the art will also recognize that appropriate dosage levels can be obtained based 
on results with known oral vaccines such as, for example, a vaccine based on an E. coli 

10 lysate (6 mg dose daily up to total of 540 mg) and with an enterotoxigenic E. coli purified 
antigen (4 doses of 1 mg) (Schulman et al., J. Urol 150:917-921 (1993); Boedecker et 
al., American Gastroenterological Assoc. 999:A-222 (1993)). The number of doses will 
depend upon the disease, the formulation, and efficacy data from clinical trials. Without 
intending any limitation as to the course of treatment, the treatment can be administered 

15 over 3 to 8 doses for a primary immunization schedule over 1 month (Boedeker, 
American Gastroenterological Assoc. 888:A-222 (1993)). 

In a preferred embodiment, a vaccine composition of the invention can be based 
on a killed whole E. coli preparation with an immunogenic fragment of an S. pneumoniae 
protein of the invention expressed on its surface or it can be based on an E. coli lysate, 

20 wherein the killed E. coli acts as a carrier or an adjuvant. 

It will be apparent to those skilled in the art that some of the vaccine 
compositions of the invention are useful only for preventing S. pneumoniae infection, 
some are usefiil only for treating S. pneumoniae infection, and some are useful for both 
preventing and treating S. pneumoniae infection. In a preferred embodiment, the vaccine 

25 composition of the invention provides protection against S. pneumoniae infection by 

stimulating humoral and/or cell-mediated immunity against S. pneumoniae. It should be 
understood that amelioration of any of the symptoms of S. pneumoniae infection is a 
desirable clinical goal, including a lessening of the dosage of medication used to treat S. 
pneumoniae-causzd disease, or an increase in the production of antibodies in the serum 

30 or mucous of patients. 
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Antibodies Reactive With 5. pneumoniae Polypeptides 

The invention also includes antibodies specifically reactive with the subject S. 
pneumoniae polypeptide. Anti-protein/anti-peptide antisera or monoclonal antibodies 
can be made by standard protocols (See, for example, Antibodies: A Laboratory Manual 
5 ed. by Harlow and Lane (Cold Spring Harbor Press: 1988)). A mammal such as a mouse, 
a hamster or rabbit can be immunized with an immunogenic form of the peptide. 
Techniques for conferring immunogenicity on a protein or peptide include conjugation to 
carriers or other techniques well known in the art. An immunogenic portion of the 
subject S. pneumoniae polypeptide can be administered in the presence of adjuvant. The 

10 progress of immunization can be monitored by detection of antibody titers in plasma or 
serum. Standard ELISA or other immunoassays can be used with the immunogen as 
antigen to assess the levels of antibodies. 

In a preferred embodiment, the subject antibodies are immunospecific for 
antigenic determinants of the S. pneumoniae polypeptides of the invention, e.g. antigenic 

15 determinants of a polypeptide of the invention contained in the Sequence Listing, or a 
closely related human or non-human mammalian homolog (e.g., 90% homologous, more 
preferably at least 95% homologous). In yet a further preferred embodiment of the 
invention, the anti-S. pneumoniae antibodies do not substantially cross react (i.e., react 
specifically) with a protein which is for example, less than 80% percent homologous to a 

20 sequence of the invention contained in the Sequence Listing. By "not substantially cross 
react", it is meant that the antibody has a binding affinity for a non-homologous protein 
which is less than 10 percent, more preferably less than 5 percent, and even more 
preferably less than 1 percent, of the binding affinity for a protein of the invention 
contained in the Sequence Listing. In a most preferred embodiment, there is no cross- 

25 reactivity between bacterial and mammalian antigens. 

The term antibody as used herein is intended to include fragments thereof which 
are also specifically reactive with S. pneumoniae polypeptides. Antibodies can be 
fragmented using conventional techniques and the fragments screened for utility in the 
same manner as described above for whole antibodies. For example, F(ab')2 fragments 

30 can be generated by treating antibody with pepsin. The resulting F(ab')2 fragment can be 
treated to reduce disulfide bridges to produce Fab 1 fragments. The antibody of the 
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invention is further intended to include bispecific and chimeric molecules having an anti- 
S. pneumoniae portion. 

Both monoclonal and polyclonal antibodies (Ab) directed against S. pneumoniae 
polypeptides or S. pneumoniae polypeptide variants, and antibody fragments such as Fab' 
5 and F(ab , )2, can be used to block the action of S. pneumoniae polypeptide and allow the 

study of the role of a particular S. pneumoniae polypeptide of the invention in aberrant or 
unwanted intracellular signaling, as well as the normal cellular function of the S. 
pneumoniae and by microinjection of anti-& pneumoniae polypeptide antibodies of the 
present invention. 

10 Antibodies which specifically bind S. pneumoniae epitopes can also be used in 

immunohistochemical staining of tissue samples in order to evaluate the abundance and 
pattern of expression of S. pneumoniae antigens. Anti S. pneumoniae polypeptide 
antibodies can be used diagnostically in immuno-precipitation and immuno-blotting to 
detect and evaluate S. pneumoniae levels in tissue or bodily fluid as part of a clinical 

15 testing procedure. Likewise, the ability to monitor S. pneumoniae polypeptide levels in 
an individual can allow determination of the efficacy of a given treatment regimen for an 
individual afflicted with such a disorder. The level of an S. pneumoniae polypeptide can 
be measured in cells found in bodily fluid, such as in urine samples or can be measured 
in tissue, such as produced by gastric biopsy. Diagnostic assays using anti-5. 

20 pneumoniae antibodies can include, for example, immunoassays designed to aid in early 
diagnosis of S. pneumoniae infections. The present invention can also be used as a 
method of detecting antibodies contained in samples from individuals infected by this 
bacterium using specific S. pneumoniae antigens. 

Another application of anti-& pneumoniae polypeptide antibodies of the 

25 invention is in the immunological screening of cDNA libraries constructed in expression 
vectors such as lgtl 1, lgt 18-23, 1ZAP, and 10RF8. Messenger libraries of this type, 
having coding sequences inserted in the correct reading frame and orientation, can 
produce fusion proteins. For instance, lgtl 1 will produce fusion proteins whose amino 
termini consist of B-galactosidase amino acid sequences and whose carboxy termini 

30 consist of a foreign polypeptide. Antigenic epitopes of a subject S. pneumoniae 

polypeptide can then be detected with antibodies, as, for example, reacting nitrocellulose 
filters lifted from infected plates with anti-5. pneumoniae polypeptide antibodies. Phage, 
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scored by this assay, can then be isolated from the infected plate. Thus, the presence of 
S. pneumoniae gene homologs can be detected and cloned from other species, and 
alternate isoforms (including splicing variants) can be detected and cloned. 

5 Kits Containing Nucleic Acids, Polypeptides or Antibodies of the Invention 

The nucleic acid, polypeptides and antibodies of the invention can be combined 
with other reagents and articles to form kits. Kits for diagnostic purposes typically 
comprise the nucleic acid, polypeptides or antibodies in vials or other suitable vessels. 
Kits typically comprise other reagents for performing hybridization reactions, polymerase 

10 chain reactions (PCR), or for reconstitution of lyophilized components, such as aqueous 
media, salts, buffers, and the like. Kits may also comprise reagents for sample 
processing such as detergents, chaotropic salts and the like. Kits may also comprise 
immobilization means such as particles, supports, wells, dipsticks and the like. Kits may 
also comprise labeling means such as dyes, developing reagents, radioisotopes, 

15 fluorescent agents, luminescent or chemiluminescent agents, enzymes, intercalating 

agents and the like. With the nucleic acid and amino acid sequence information provided 
herein, individuals skilled in art can readily assemble kits to serve their particular 
purpose. Kits further can include instructions for use. 

20 Drug Screening Assays Using S. pneumoniae Polypeptides 

By making available purified and recombinant S. pneumoniae polypeptides, the 

present invention provides assays which can be used to screen for drugs which are either 

agonists or antagonists of the normal cellular function, in this case, of the subject S. 

pneumoniae polypeptides, or of their role in intracellular signaling. Such inhibitors or 
25 potentiators may be useful as new therapeutic agents to combat S. pneumoniae infections 

in humans. A variety of assay formats will suffice and, in light of the present inventions, 

will be comprehended by the skilled artisan. 

In many drug screening programs which test libraries of compounds and natural 

extracts, high throughput assays are desirable in order to maximize the number of 
30 compounds surveyed in a given period of time. Assays which are performed in cell-free 

systems, such as may be derived with purified or semi-purified proteins, are often 

preferred as "primary" screens in that they can be generated to permit rapid development 
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and relatively easy detection of an alteration in a molecular target which is mediated by a 
test compound. Moreover, the effects of cellular toxicity and/or bioavailability of the test 
compound can be generally ignored in the in vitro system, the assay instead being 
focused primarily on the effect of the drug on the molecular target as may be manifest in 
5 an alteration of binding affinity with other proteins or change in enzymatic properties of 
the molecular target. Accordingly, in an exemplary screening assay of the present 
invention, the compound of interest is contacted with an isolated and purified S. 
pneumoniae polypeptide. 

Screening assays can be constructed in vitro with a purified S. pneumoniae 

10 polypeptide or fragment thereof, such as an S. pneumoniae polypeptide having enzymatic 
activity, such that the activity of the polypeptide produces a detectable reaction product. 
The efficacy of the compound can be assessed by generating dose response curves from 
data obtained using various concentrations of the test compound. Moreover, a control 
assay can also be performed to provide a baseline for comparison. Suitable products 

15 include those with distinctive absorption, fluorescence, or chemi-luminescence 
properties, for example, because detection may be easily automated. A variety of 
synthetic or naturally occurring compounds can be tested in the assay to identify those 
which inhibit or potentiate the activity of the S. pneumoniae polypeptide. Some of these 
active compounds may directly, or with chemical alterations to promote membrane 

20 permeability or solubility, also inhibit or potentiate the same activity (e.g., enzymatic 
activity) in whole, live S. pneumoniae cells. 
Overexpression Assays 

Overexpression assays are based on the premise that overproduction of a protein 
would lead to a higher level of resistance to compounds that selectively interfere with the 

25 function of that protein. Overexpression assays may be used to identify compounds that 
interfere with the function of virtually any type of protein, including without limitation 
enzymes, receptors, DNA- or RNA-binding proteins, or any proteins that are directly or 
indirectly involved in regulating cell growth. 

Typically, two bacterial strains are constructed. One contains a single copy of the 

30 gene of interest, and a second contains several copies of the same gene. Identification of 
useful inhibitory compounds of this type of assay is based on a comparison of the activity 
of a test compound in inhibiting growth and/or viability of the two strains. The method 
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involves constructing a nucleic acid vector that directs high level expression of a 
particular target nucleic acid. The vectors are then transformed into host cells in single 
or multiple copies to produce strains that express low to moderate and high levels of 
protein encoding by the target sequence (strain A and B, respectively). Nucleic acid 
5 comprising sequences encoding the target gene can, of course, be directly integrated into 
the host cell. 

Large numbers of compounds (or crude substances which may contain active 
compounds) are screened for their effect on the growth of the two strains. Agents which 
interfere with an unrelated target equally inhibit the growth of both strains. Agents 

10 which interfere with the function of the target at high concentration should inhibit the 
growth of both strains. It should be possible, however, to titrate out the inhibitory effect 
of the compound in the overexpressing strain. That is, if the compound is affecting the 
particular target that is being tested, it should be possible to inhibit the growth of strain A 
at a concentration of the compound that allows strain B to grow. 

15 Alternatively, a bacterial strain is constructed that contains the gene of interest 

under the control of an inducible promoter. Identification of useful inhibitory agents 
using this type of assay is based on a comparison of the activity of a test compound in 
inhibiting growth and/or viability of this strain under both inducing and non-inducing 
conditions. The method involves constructing a nucleic acid vector that directs high- 

20 level expression of a particular target nucleic acid. The vector is then transformed into 
host cells that are grown under both non-inducing and inducing conditions (conditions A 
and B, respectively). 

Large numbers of compounds (or crude substances which may contain active 
compounds) are screened for their effect on growth under these two conditions. Agents 

25 that interfere with the function of the target should inhibit growth under both conditions. 
It should be possible, however, to titrate out the inhibitory effect of the compound in the 
overexpressing strain. That is, if the compound is affecting the particular target that is 
being tested, it should be possible to inhibit growth under condition A at a concentration 
that allows the strain to grow under condition B. 

30 Ligand-binding Assays 

Many of the targets according to the invention have functions that have not yet 
been identified. Ligand-binding assays are useful to identify inhibitor compounds that 
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interfere with the function of a particular target, even when that function is unknown. 
These assays are designed to detect binding of test compounds to particular targets. The 
detection may involve direct measurement of binding. Alternatively, indirect indications 
of binding may involve stabilization of protein structure or disruption of a biological 
5 function. Non-limiting examples of useful ligand-binding assays are detailed below. 
A useful method for the detection and isolation of binding proteins is the 
Biomolecular Interaction Assay (BIAcore) system developed by Pharmacia Biosensor 
and described in the manufacturer's protocol (LKB Pharmacia, Sweden). The BIAcore 
system uses an affinity purified anti-GST antibody to immobilize GST-fusion proteins 
10 onto a sensor chip. The sensor utilizes surface plasmon resonance which is an optical 

phenomenon that detects changes in refractive indices. In accordance with the practice of 
the invention, a protein of interest is coated onto a chip and test compounds are passed 
over the chip. Binding is detected by a change in the refractive index (surface plasmon 
resonance). 

1 5 A different type of ligand-binding assay involves scintillation proximity assays 

(SPA, described in U.S. Patent No. 4,568,649). 

Another type of ligand binding assay, also undergoing development, is based on 
the fact that proteins containing mitochondrial targeting signals are imported into isolated 
mitochondria in vitro (Hurt et al, 1985, Embo J. 4:2061-2068; Eilers and Schatz, Nature, 

20 1986, 322:228-23 1). In a mitochondrial import assay, expression vectors are constructed 
in which nucleic acids encoding particular target proteins are inserted downstream of 
sequences encoding mitochondrial import signals. The chimeric proteins are synthesized 
and tested for their ability to be imported into isolated mitochondria in the absence and 
presence of test compounds. A test compound that binds to the target protein should 

25 inhibit its uptake into isolated mitochondria in vitro. 

Another ligand-binding assay is the yeast two-hybrid system (Fields and Song, 
1989, Nature 340:245-246). The yeast two-hybrid system takes advantage of the 
properties of the GAL4 protein of the yeast Saccharomyces cerevisiae. The GAL4 
protein is a transcriptional activator required for the expression of genes encoding 

30 enzymes of galactose utilization. This protein consists of two separable and functionally 
essential domains: an N-terminal domain which binds to specific DNA sequences 
(UASg); and a C-terminal domain containing acidic regions, which is necessary to 
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activate transcription. The native GAL4 protein, containing both domains, is a potent 
activator of transcription when yeast are grown on galactose media. The N-terminal 
domain binds to DNA in a sequence-specific manner but is unable to activate 
transcription. The C-terminal domain contains the activating regions but cannot activate 
5 transcription because it fails to be localized to UAS G . In the two-hybrid system, a system 
of two hybrid proteins containing parts of GAL4: (1) a GAL4 DNA-binding domain 
fused to a protein 'X 1 and (2) a GAL4 activation region fused to a protein 'Y\ If X and Y 
can form a protein-protein complex and reconstitute proximity of the GAL4 domains, 
transcription of a gene regulated by UASg occurs. Creation of two hybrid proteins, each 

10 containing one of the interacting proteins X and Y, allows the activation region of UASg 
to be brought to its normal site of action. 

The binding assay described in Fodor et aL, 1991, Science 251:767-773, which 
involves testing the binding affinity of test compounds for a plurality of defined polymers 
synthesized on a solid substrate, may also be useful. 

1 5 Compounds which bind to the polypeptides of the invention are potentially useful 

as antibacterial agents for use in therapeutic compositions. 

Pharmaceutical formulations suitable for antibacterial therapy comprise the 
antibacterial agent in conjunction with one or more biologically acceptable carriers. 
Suitable biologically acceptable carriers include, but are not limited to, phosphate- 

20 buffered saline, saline, deionized water, or the like. Preferred biologically acceptable 
carriers are physiologically or pharmaceutically acceptable carriers. 

The antibacterial compositions include an antibacterial effective amount of active 
agent. Antibacterial effective amounts are those quantities of the antibacterial agents of 
the present invention that afford prophylactic protection against bacterial infections or 

25 which result in amelioration or cure of an existing bacterial infection. This antibacterial 
effective amount will depend upon the agent, the location and nature of the infection, and 
the particular host. The amount can be determined by experimentation known in the art, 
such as by establishing a matrix of dosages and frequencies and comparing a group of 
experimental units or subjects to each point in the matrix. 

30 The antibacterial active agents or compositions can be formed into dosage unit 

forms, such as for example, creams, ointments, lotions, powders, liquids, tablets, 
capsules, suppositories, sprays, aerosols or the like. If the antibacterial composition is 
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formulated into a dosage unit form, the dosage unit form may contain an antibacterial 
effective amount of active agent. Alternatively, the dosage unit form may include less 
than such an amount if multiple dosage unit forms or multiple dosages are to be used to 
administer a total dosage of the active agent. Dosage unit forms can include, in addition, 
5 one or more excipient(s), diluent(s), disintegrant(s), lubricant(s), plasticizer(s), 

colorant(s), dosage vehicle(s), absorption enhancer(s), stabilizer(s), bactericide(s), or the 
like. 

For general information concerning formulations, see, e.g., Gilman et al. (eds.), 
1990, Goodman and Oilman's: The Pharmacological Basis of Therapeutics, 8th ed., 

10 Pergamon Press; and Remington's Pharmaceutical Sciences, 1 7th ed., 1990, Mack 
Publishing Co., Easton^ PA; Avis et al. (eds.), 1993, Pharmaceutical Dosage Forms: 
Parenteral Medications, Dekker, New York; Lieberman et al (eds.), 1990, 
Pharmaceutical Dosage Forms: Disperse Systems, Dekker, New York. 

The antibacterial agents and compositions of the present invention are useful for 

1 5 preventing or treating S. pneumoniae infections. Infection prevention methods 

incorporate a prophylactically effective amount of an antibacterial agent or composition. 
A prophylactically effective amount is an amount effective to prevent S. pneumoniae 
infection and will depend upon the specific bacterial strain, the agent, and the host. 
These amounts can be determined experimentally by methods known in the art and as 

20 described above. 

S. pneumoniae infection treatment methods incorporate a therapeutically effective 
amount of an antibacterial agent or composition. A therapeutically effective amount is an 
amount sufficient to ameliorate or eliminate the infection. The prophylactically and/or 
therapeutically effective amounts can be administered in one administration or over 

25 repeated administrations. Therapeutic administration can be followed by prophylactic 
administration, once the initial bacterial infection has been resolved. 

The antibacterial agents and compositions can be administered topically or 
systemically. Topical application is typically achieved by administration of creams, 
ointments, lotions, or sprays as described above. Systemic administration includes both 

30 oral and parental routes. Parental routes include, without limitation, subcutaneous, 
intramuscular, intraperitoneal, intravenous, transdermal, inhalation and intranasal 
administration. 
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EXEMPLIFICATION 

I. Cloning and Sequencing of S. pneumoniae DNA 
5 S. pneumoniae chromosomal DNA was isolated according to a basic DNA 

protocol outlined in Schleif R.F. and Wensink P.C., Practical Methods in Molecular 
Biology, p.98, Springer- Verlag, NY., 1981, with minor modifications. Briefly, cells were 
pelleted, resuspended in TE (10 mM Tris, 1 mM EDTA, pH 7.6) and GES lysis buffer 
(5.1 M guanidium thiocyanate, 0.1 M EDTA, pH 8.0, 0.5% N-laurylsarcosine) was 

10 added. Suspension was chilled and ammonium acetate (NH4Ac) was added to final 
concentration of 2.0 M. DNA was extracted, first with chloroform, then with phenol- 
chloroform, and reextracted with chloroform. DNA was precipitated with isopropanol, 
washed twice with 70% EtOH, dried and resuspended in TE. 

Following isolation whole genomic S. pneumoniae DNA was nebulized 

15 (Bodenteich et al., Automated DNA Sequencing and Analysis (J.C. Venter, ed.), 

Academic Press, 1994) to a median size of 2000 bp. After nebulization, the DNA was 
concentrated and separated on a standard 1% agarose gel. Several fractions, 
corresponding to approximate sizes 1000-1500 bp, 1500-2000 bp, 2000-2500 bp, 2500- 
3000bp, were excised from the gel and purified by the GeneClean procedure (BiolOl, 

20 Inc.). 

The purified DNA fragments were then blunt-ended using T4 DNA polymerase. 
The healed DNA was then ligated to unique BstXI-linker adapters (5' 
GTCTTCACCACGGGG and 5' GTGGTGAAGAC in 100-1000 fold molar excess). 
These linkers are complimentary to the BstXI-cut pMPX vectors, while the overhang is 

25 not self-complimentary. Therefore, the linkers will not concatemerize nor will the cut- 
vector religate itself easily. The linker-adopted inserts were separated from the 
unincorporated linkers on a 1% agarose gel and purified using GeneClean. The linker- 
adopted inserts were then ligated to each of 20 pMPX vectors to construct a series of 
"shotgun" subclone libraries. Blunt ended vector was used for cloning into the PUC19 

30 vector. The vectors contain an out-of-frame lacZ gene at the cloning site which becomes 
in-frame in the event that an adapter-dimer is cloned, allowing these to be avoided by 
their blue-color. 
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All subsequent steps were based either on the multiplex DNA sequencing 
protocols outlined in Church G.M. and Kieffer-Higgins S., Science 240:185-188, 1988 or 
by ABI377 automated DNA sequencing methods. Only major modifications to the 
protocols are highlighted. Briefly, each of the 20 vectors was then transformed into 
5 DH5a competent cells (Gibco/BRL, DH5a transformation protocol). The libraries were 
assessed by plating onto antibiotic plates containing ampicillin, methicillin and 
IPTG/Xgal. The plates were incubated overnight at 37°C. Successful transformants were 
then used for plating of clones and pooling into the multiplex pools. The clones were 
picked and pooled into 40 ml growth medium cultures. The cultures were grown 

10 overnight at 37 °C. DNA was purified using the Qiagen Midi-prep kits and Tip- 100 
columns (Qiagen, Inc.). In this manner, 100 mg of DNA was obtained per pool.. 

These purified DNA samples were then sequenced either using the multiplex 
DNA sequencing based on chemical degradation methods (Church G.M. and Kieffer- 
Higgins S., Science 240:185-188, 1988) or by Sequithrem (Epicenter Technologies) 

1 5 dideoxy sequencing protocols or by ABI dye-terminator chemistry. For the multiplex 
portion the sequencing reactions were electrophoresed and transferred onto nylon 
membranes by direct transfer electrophoresis from 40 cm gels (Richterich P. and Church 
G.M., Methods in Enzymology 218:1 87-222, 1 993). The DNA was covalently bound to 
the membranes by exposure to ultraviolet light, and hybridized with labeled 

20 oligonucleotides complimentary to tag sequences on the vectors (Church, supra). The 
membranes were washed to rinse off non-specifically bound probe, and exposed to X-ray 
film to visualize individual sequence ladders. After autoradiography, the hybridized 
probe was removed by incubation at 65°C, and the hybridization cycle repeated with 
another tag sequence until the membrane had been probed 41 times.. Thus, each gel 

25 produced a large number of films, each containing new sequencing information. 

Whenever a new blot was processed, it was initially probed for an internal standard 
sequence added to each of the pools. Digital images of the films were generated using a 
laser-scanning densitometer (Molecular Dynamics, Sunnyvale, CA). The digitized 
images were processed on computer workstations (VaxStation 4000's) using the program 

30 REPLICA™ (Church et al., Automated DNA Sequencing and Analysis (J.C. Venter, 
ed.), Academic Press, 1994). Image processing included lane straightening, contrast 
adjustment to smooth out intensity differences, and resolution enhancement by iterative 
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gaussian deconvolution. The sequences were then converted to an SCF format so that 
processing and assembly could proceed on UNIX machines. The ABI dye terminator 
sequence reads were run on ABI377 machines and the data was directly transferred to 
UNIX machinnes following lane tracking of the gels. All multiplex and ABI reads were 
5 assembled using PHRAP (P. Green, Abstracts of DOE Human Genome Program 
Contractor-Grantee Workshop V, Jan. 1996, p. 157) with default parameters and not 
using quality scores. The initial assembly was done at 7fold coverage and yielded 511 
contigs. Short read length fragments of 200 bp or less found on the ends of contigs 
facing in the appropriate direction were used to extendoff the end of the contigs.. These 
10 reads were then resequenced with primers using ABI technology to give sequences with a 
read length of 500 or more bases. This allowed end extensions to be performed without 
ordering new primers. In addition, missing mates (sequences from clones that only gave 
one strand reads) were identified and sequenced with ABI technology to allow the 
identification of additional overlapping contigs. 

15 

End-sequencing of randomly picked genomic lambda was also performed. 
Sequencing on a both sides was done for all lambda sequences. The lambdalibrary 
backbone helped to verify the integrity of the assembly and allowed closure of some of 
the physical gaps. 

20 

To identify S. pneumoniae polypeptides the complete genomic sequence of S. 
pneumoniae were analyzed essentially as follows: First, all possible stop-to- stop open 
reading frames (ORFs) greater than 180 nucleotides in all six reading frames were 
translated into amino acid sequences. Second, the identified ORFs were analyzed for 
25 homology to known (archeabacter, prokaryotic and eukaryotic) protein sequences. Third, 
the coding potential of non-homologous sequences were evaluated with the program 
GENEMARKTM (Borodovsky and Mclninch, 1993, Comp. Chem. 17:123). 

Identification, Cloning and Expression of S. pneumoniae Nucleic Acids 
30 Expression and purification of the S. pneumoniae polypeptides of the invention 

can be performed essentially as outlined below. 
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To facilitate the cloning, expression and purification of membrane and secreted 
proteins from 5. pneumoniae, a gene expression system, such as the pET System 
(Novagen), for cloning and expression of recombinant proteins in E. coli, is selected. 
Also, a DNA sequence encoding a peptide tag, the His-Tag, is fused to the 3' end of 
5 DNA sequences of interest in order to facilitate purification of the recombinant protein 
products. The 3 5 end is selected for fiision in order to avoid alteration of any 5' terminal 
signal sequence. 

PCR Amplification and Cloning of Nucleic Acids Containing ORFs Encoding 
Enzymes 

10 Nucleic acids chosen (for example, from the nucleic acids set forth in SEQ ID 

NO: 1 - SEQ ID NO: 2603 ) for cloning from the 14453 strain of S. pneumoniae are 
prepared for amplification cloning by polymerase chain reaction (PCR). Synthetic 
oligonucleotide primers specific for the 5 7 and 3 ; ends of open reading frames (ORFs) 
are designed and purchased from GibcoBRL Life Technologies (Gaithersburg, MD, 

1 5 USA). All forward primers (specific for the 5 ; end of the sequence) are designed to 

include an Ncol cloning site at the extreme 5 f terminus. These primers are designed to 
permit initiation of protein translation at a methionine residue followed by a valine 
residue and the coding sequence for the remainder of the native S. pneumoniae DNA 
sequence. All reverse primers (specific for the 3 y end of any S. pneumoniae ORF) 

20 include a EcoRI site at the extreme 5 ; terminus to permit cloning of each S. pneumoniae 
sequence into the reading frame of the pET-28b. The pET-28b vector provides sequence 
encoding an additional 20 carboxy-terminal amino acids including six histidine residues 
(at the extreme C-terminus), which comprise the His-Tag. 

Genomic DNA prepared from strain 14453 of S. pneumoniae is used as the 

25 source of template DNA for PCR amplification reactions (Current Protocols in Molecular 
Biology, John Wiley and Sons, Inc., F. Ausubel et al., eds., 1994). To amplify a DNA 
sequence containing an S. pneumoniae ORF, genomic DNA (50 nanograms) is 
introduced into a reaction vial containing 2 mM MgCl2 5 1 micromolar synthetic 

oligonucleotide primers (forward and reverse primers) complementary to and flanking a 
30 defined & pneumoniae ORF, 0.2 mM of each deoxynucleotide triphosphate; dATP, 
dGTP, dCTP, dTTP and 2.5 units of heat stable DNA polymerase (Amplitaq, Roche 
Molecular Systems, Inc., Branchburg, NJ, USA) in a final volume of 100 microliters. 



Docket No.: GTC03-02 



73 



Upon completion of thermal cycling reactions, each sample of amplified DNA is 
washed and purified using the Qiaquick Spin PCR purification kit (Qiagen, Gaithersburg, 
MD, USA). All amplified DNA samples are subjected to digestion with the restriction 
endonucleases, e.g., Ncol and EcoRI (New England BioLabs, Beverly, MA, 
5 USA)(Current Protocols in Molecular Biology, John Wiley and Sons, Inc., F. Ausubel et 
al., eds., 1994). DNA samples are then subjected to electrophoresis on 1.0 %NuSeive 
(FMC BioProducts, Rockland, ME USA) agarose gels. DNA is visualized by exposure 
to ethidium bromide and long wave uv irradiation. DNA contained in slices isolated 
from the agarose gel is purified using the Bio 101 GeneClean Kit protocol (Bio 101 
10 Vista, CA, USA). 

Cloning of 5. pneumoniae Nucleic Acids Into an Expression Vector 

The pET-28b vector is prepared for cloning by digestion with endonucleases, e.g., 

Ncol and EcoRI (Current Protocols in Molecular Biology, John Wiley and Sons, Inc., F. 
1 5 Ausubel et al., eds., 1994). The pET-28a vector, which encodes a His-Tag that can be 

fused to the 5 f end of an inserted gene, is prepared by digestion with appropriate 

restriction endonucleases. 

Following digestion, DNA inserts are cloned (Current Protocols in Molecular 

Biology, John Wiley and Sons, Inc., F. Ausubel et al., eds., 1994) into the previously 
20 digested pET-28b expression vector. Products of the ligation reaction are then used to 

transform the BL21 strain of E. coli (Current Protocols in Molecular Biology, John Wiley 

and Sons, Inc., F. Ausubel et al., eds., 1994) as described below. 

Transformation Of Competent Bacteria With Recombinant Plasmids 
Competent bacteria, E coli strain BL21 or E. coli strain BL21(DE3), are 
25 transformed with recombinant pET expression plasmids carrying the cloned S. 

pneumoniae sequences according to standard methods (Current Protocols in Molecular, 

John Wiley and Sons, Inc., F. Ausubel et al., eds., 1994). Briefly, 1 microliter of ligation 

reaction is mixed with 50 microliters of electrocompetent cells and subjected to a high 

voltage pulse, after which, samples are incubated in 0.45 milliliters SOC medium (0.5% 
30 yeast extract, 2.0 % tryptone, 10 mM NaCl, 2.5 mM KC1, 10 mM MgC12, 10 mM 

MgS04 and 20, mM glucose) at 37°C with shaking for 1 hour. Samples are then spread 

on LB agar plates containing 25 microgram/ml kanamycin sulfate for growth overnight. 
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Transformed colonies of BL21 are then picked and analyzed to evaluate cloned inserts as 
described below. 

Identification Of Recombinant Expression Vectors With S. pneumoniae Nucleic 

Acids 

5 Individual BL21 clones transformed with recombinant pET-28b S. pneumoniae 

ORFs are analyzed by PCR amplification of the cloned inserts using the same forward 
and reverse primers, specific for each S. pneumoniae sequence, that were used in the 
original PCR amplification cloning reactions. Successful amplification verifies the 
integration of the S. pneumoniae sequences in the expression vector (Current Protocols in 
10 Molecular Biology, John Wiley and Sons, Inc., F. Ausubel et al., eds., 1994). 
Isolation and Preparation of Nucleic Acids From Transformants 
Individual clones of recombinant pET-28b vectors carrying properly cloned S. 
pneumoniae ORFs are picked and incubated in 5 mis of LB broth plus 25 microgram/ml 
kanamycin sulfate overnight. The following day plasmid DNA is isolated and purified 
15 using the Qiagen plasmid purification protocol (Qiagen Inc., Chatsworth, CA, USA). 
Expression Of Recombinant S. pneumoniae Sequences In E. coli 
The pET vector can be propagated in any E. coli K-12 strain e.g. HMS1 74, 
HB101, JM109, DH5, etc. for the purpose of cloning or plasmid preparation. Hosts for 
expression include E. coli strains containing a chromosomal copy of the gene for T7 
20 RNA polymerase. These hosts are lysogens of bacteriophage DE3, a lambda derivative 
that carries the lad gene, the lacUV5 promoter and the gene for T7 RNA polymerase. T7 
RNA polymerase is induced by addition of isopropyl-B-D-thiogalactoside (IPTG), and 
the T7 RNA polymerase transcribes any target plasmid, such as pET-28b, carrying its 
gene of interest. Strains used include: BL21(DE3) (Studier, F.W., Rosenberg, A.H., 
25 Dunn, J.J., and Dubendorff, J.W. (1990) Meth. Enzymol. 185, 60-89). 

To express recombinant S. pneumoniae sequences, 50 nanograms of plasmid 
DNA isolated as described above is used to transform competent BL21(DE3) bacteria as 
described above (provided by Novagen as part of the pET expression system kit). The 
lacZ gene (beta-galactosidase) is expressed in the pET-System as described for the S. 
30 pneumoniae recombinant constructions. Transformed cells are cultured in SOC medium 
for 1 hour, and the culture is then plated on LB plates containing 25 micrograms/ml 
kanamycin sulfate. The following day, bacterial colonies are pooled and grown in LB 
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medium containing kanamycin sulfate (25 micrograms/ml) to an optical density at 600 
nM of 0.5 to 1.0 O.D. units, at which point, 1 millimolar IPTG was added to the culture 
for 3 hours to induce gene expression of the 5. pneumoniae recombinant DNA 
constructions . 

5 After induction of gene expression with IPTG, bacteria are pelleted by 

centrifiigation in a Sorvall RC-3B centrifuge at 3500 x g for 15 minutes at 4°C. Pellets 
are resuspended in 50 milliliters of cold 10 mM Tris-HCl, pH 8.0, 0.1 M NaCl and 0.1 

mM EDTA (STE buffer). Cells are then centrifuged at 2000 x g for 20 min at 4°C. Wet 

pellets are weighed and frozen at -80°C until ready for protein purification. 

10 A variety of methodologies known in the art can be utilized to purify the isolated 

proteins. (Current Protocols in Protein Science, John Wiley and Sons, Inc., J. E. Coligan 
et al., eds., 1995). For example, the frozen cells may be thawed, resupended in buffer 
and ruptured by several passages through a small volume microfluidizer (Model M-l 10S, 
Microfluidics International Corporation, Newton, MA). The resultant homogenate may 

15 be centrifuged to yield a clear supernatant (crude extract) and following filtration the 
crude extract may be fractionated over columns. Fractions may be monitored by 
absorbance at OD28O nm. and peak fractions may analyzed by SDS-PAGE 

The concentrations of purified protein preparations may be quantified 
spectrophotometrically using absorbance coefficients calculated from amino acid content 

20 (Perkins, S.J. 1986 Eur. J. Biochem. 157, 169-180). Protein concentrations are also 
measured by the method of Bradford, M.M. (1976) Anal. Biochem. 72, 248-254, and 
Lowry, O.H., Rosebrough, N., Fair, A.L. & Randall, RJ. (1951) J. Biol. Chem. 193, 
pages 265-275, using bovine serum albumin as a standard. 

SDS-polyacrylamide gels of various concentrations may be purchased from 

25 BioRad (Hercules, CA, USA), and stained with Coomassie blue. Molecular weight 
markers may include rabbit skeletal muscle myosin (200 kDa), E. coli (-galactosidase 
(116 kDa), rabbit muscle phosphorylase B (97.4 kDa), bovine serum albumin (66.2 kDa), 
ovalbumin (45 kDa), bovine carbonic anhydrase (3 1 kDa), soybean trypsin inhibitor 
(21.5 kDa), egg white lysozyme (14.4 kDa) and bovine aprotinin (6.5 kDa). 
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EQUIVALENTS 

Those skilled in the art will recognize, or be able to ascertain using no more than 
routine experimentation, many equivalents to the specific embodiments and methods 
described herein. The specific embodiments described herein are offered by way of 
5 example only , and the invention is to limited only by the terms of the appended claims, 
along with the full scope of equivalents to which such claims are entitled. 
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